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outcomes, crime and related behavior, and items linked to transitioning to 
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The paper describes the basic model, which identifies household resources, 
particularly income, and distinguishes between transmissions and processes 
broadly within the family and individual sphere. It then examines possible 
data requirements and mechanisms in more detail, evaluating how far the 
available data sources include the appropriate data. After discussing 
neighborhood effects and data on service quality, the paper explores the 
technical questions of data linkage and ethical and legal constraints. It 
concludes that there is not one study that encompasses all that would be 
needed to chart and explain the relationship between poverty in childhood and 
the major outcomes in the short and medium term, though with the increasing 
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CHAPTER 1 - INTRODUCTION 



1.1 Background to the scoping study 



This paper was jointly funded by the Department for Work and Pensions 
(DWP) (formerly Department of Social Security), HM Treasury (HMT) and the 
Social Exclusion Unit (SEU). In addition, funding was provided by the 
T reasury's Evidence Base Policy Fund. The tender brief for this study 
requested a scoping paper 'to outline different research strategies, based on 
existing data, new data or both, to help understand the processes that link low 
incomes, deprived neighbourhoods and adverse outcomes for children across 
generations'. The intention was to provide ‘both a short and long-term 
framework to commission further research to inform policy development... and 
to shed light on the relative importance of policies to raise incomes, improve 
public services and tackle the problems in deprived neighbourhoods.' The last 
point was a specific reference to the growing number of area-based initiatives 
(ABIs) that are targeted at poor neighbourhoods or poor children or both, and 
from which evaluation data are beginning to emerge. 

The objective for the scoping study was explicitly nof to review current 
knowledge about child poverty based on an overview of existing research. A 
number of studies that cover this ground extensively have been published 
since the scoping study was commissioned. These include Jonathan 
Bradshaw’s edited collection Poverty: the outcomes for children (Bradshaw et 
al., 2001): Brewer and Gregg’s ‘Eradicating Child Poverty in Britain: Welfare 
Reform and Children Since 1997’ (IFS Working Paper, May 2001), as well as 
forthcoming studies such as Bradbury et al. (2001). 

Rather than simply add to this collection, the aim was to identify gaps and 
limitations in existing datasets and research strategies that might be the basis 
for further research that could build on existing studies, for example by data 
enhancement or data linkage between one or more sets of data, or by newly 
commissioned data collection exercises. This exercise was to be informed by 
a more technical discussion of the types of data required to answer some of 
these more challenging long-term questions about child poverty and child 
outcomes. Duncan et al.’s (1998) analysis of longitudinal data linking 
childhood poverty and subsequent life chances, drawing on the Panel Study 
of Income Dynamics (PSID) in the US with data on children born between 
1 967/73 and followed from birth to age 20, is one example of this type, where 
data on low income in early childhood was linked to subsequent educational 
progress. 

The main thrust of this review therefore encompasses both these key themes 
- that is, it reviews current and likely future datasets and developments, and 
makes a technical assessment of what would ideally be required. While the 
intention is to make this as comprehensive and complete as possible, we 
would not claim in the limits of a short study to have uncovered every initiative 
or likely development in the pipeline nor indeed all the ways of handling 
possible future research. 
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The aims of the study were therefore: 

1 . to clarify the key issues; 

2. to construct a framework for research; 

3. to review the data available, and the scope for data linkage to study 
child poverty and child outcomes; 

4. to make recommendations about possible future research 
developments. 



1.2 Policy relevance 



While the scoping study was explicitly not a review of research findings we 
were encouraged to keep the relevance to policy firmly in mind, and to locate 
this technical research exercise in the wider framework of policy objectives of 
sharply reducing and finally eliminating child poverty in a twenty year time 
frame. 

The existence and persistence of child poverty in Britain are now very well 
documented from cross-sectional surveys, panel studies and administrative 
data. The steep rise in child poverty since the latter part of the 1970s through 
to the late 1990s is also very well documented, as are the exceptionally high 
rates of child poverty in Britain in comparison to other European countries 
(DSS, 2000). As a concomitant of these high rates the intense geographical 
concentration of poor children is also increasingly well mapped (Noble, Evans 
et al., 2001). What is less well understood are the links over time between the 
experiences of child poverty, medium and long-term child outcomes, the 
intervening effects of 'within family’ processes, local services and 
neighbourhoods, particularly the high geographical concentrations of poor 
families in some areas. Duncan etal. (1998) draw attention to the ‘surprising 
volatility’ of family incomes in the US, a finding which appears to be emerging 
from studies in the UK that have repeated data over short intervals of time. 
This more dynamic picture provides the critical backdrop for more effective 
policies to reduce and eliminate child poverty. 

There are several additional issues to be kept in mind in this discussion of the 
possible links between the experience of growing up in poverty and 
subsequent outcomes. First, there may not be a single critical time point for all 
significant outcomes. Thus, the repeated finding about the importance of the 
early years for significant intervention may apply strongly to early cognitive 
skills and subsequent educational achievement. Certainly Duncan et al. 

(1998) suggest from their PSID data that family income levels for ages 0-5 
may have a strong influence on educational progress, but have much less 
impact on some health and behavioural outcomes. But' students of the long- 
term effects of preschool intervention, studies that cover 20 years of 
development or more, will know that subsequent events are also important at 
sustaining or undermining these early gains (Schweinhartet al., 1993). And 
there may well be other critical periods such as the decision to stay on at or 
leave school at the minimum age, or the series of transitions into adult life (in 
employment, housing, family etc.). This possibility - of different critical periods 
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for different outcomes - immediately introduces substantial complexity into any 
dataset that might meet the many potential data requirements to answer this 
set of questions. 

Second, intensive long-term evaluation studies of the effects of early 
intervention on later development underline the way that initial interventions 
do not have simple linear effects (as originally might have been assumed) and 
may act through intermediate routes, for example, by strengthening parental 
support at critical periods rather than by enhancing children’s cognitive skills. 

It may be that the growing number of major evaluations commissioned to 
study interventions such as Sure Start aimed at children growing up in poverty 
will, in time, bring in more information about these patterns to supplement the 
picture emerging from national cohort and panel studies. 

Third, it may well be that these experiences of poverty are different for 
different groups who may be categorised under the general heading of 'child 
poverty’: for example different ethnic groups and family types, as well as those 
living in urban or rural areas and in different regions. There are also groups 
such as those with special educational needs, children with disabilities or 
those growing up in care whose experiences might be different. In all these 
cases, much of the data available with a longitudinal component do not 
contain the necessary classifications or numbers to follow these groups in any 
detail. Again this adds a substantial complexity to any potential dataset. Can it 
contain enough cases of these different groups over time to permit different 
patterns or trends to emerge, and their possible causes and correlates? 

Finally, there is the question of the relative importance of the different levels - 
individual, family, neighbourhood and services - on these different outcomes. 
Apart from the conceptual question ‘what is a neighbourhood effect?’, there 
are design considerations about how best to take these questions into 
account. Although there is some dispute about the relative importance of 
these components (Kleinman, 1999; Berthoud, 2001; McCulloch, 2001) our 
position is that the issue of ‘neighbourhood effects’ is yet to be resolved and 
the resolution will depend on appropriate data, measurement and analysis. 

Some of the key policy relevant questions might therefore include: 

• what are the processes or ’pathways’ that link low income, deprived 
neighbourhoods and poor outcomes for children (particularly linking 
poverty measures to child outcomes at the individual level)? 

• what are the key transmission mechanisms (for instance, within the 
family)? 

• how different are the pathways for different outcomes and for different 
groups of children? 

• are there key stages (e.g. the early years)? Do these vary for different 
types of child outcome? 



• what are the consequences of different levels of exposure to poverty 
(length of time, intensity)? 

• is the ‘poverty’ experienced by children best measured by household 
income? 

• in addition to measuring household income, how far is it possible to 
measure the ways that such income impinges on children (i!e. do they get 
more of or less than their ‘fair share’ of household resources)? 

• what is the impact of assets and access to financial resources (‘financial 
exclusion’) in the light of the growing policy interest in the impact of access 
to such assets? 

• are there clear ‘cut-off points’ where children living in persistently poor 
circumstances have significantly higher risks of poor outcomes? And how 
far does this lend support to area or individually targeted programmes? 

• what is the impact of geographical concentrations of poor families and 
children? 

• are there advantages in growing up in socially mixed rather than in 
socially uniform neighbourhoods? 

• what is the relative impact on long-term outcomes of policies aimed at 
raising incomes, or policies to improve local services? 

• how far is there an independent contribution to life chances, from sets of 
attitudes and orientations towards opportunities - for example in exercising 
choice in service provision (e.g. choice of school)? 

• how far can comparative studies with other countries throw light on these 
mechanisms, given the UK’s apparently poor record relative to many other 
European countries in the level and persistence of child poverty? 

• how far can simulation methods of the effects of tax and benefit changes 
(e.g. Polimod and Euromod - Piachaud and Sutherland, 2001) contribute 
to the assessment of child poverty and the impact of potential future 
policies? 



1.3 Scope and coverage 



We have taken age 0-20 as the potential range to cover children and young 
persons, though there arguments for extending this selectively to age 24 in 
view of the extended transition period for many young people. And we have 
included most of the major ‘outcomes’. We set out four domains - education, 
health and psychological outcomes, crime and related behaviour and a 
collection of items linked to the transition to adult life, such as 



(un)employment, homelessness and early family formation. Much of our 
coverage in these areas must necessarily be illustrative. Our coverage is also 
largely restricted to the UK, and several of the studies listed cover only parts 
of the UK. We have not covered comparative material, though we would 
underline its value in exposing some of the special features of the position in 
the UK (e.g. Jenkins and Schluter, 2000), and possibly the consequences of 
different mixes of policy on levels of child poverty. There is also scope for 
comparing the findings of studies such as Duncan et al. (1998) with similar 
work in the UK. Following the growing emphasis on the problem of child 
poverty across Europe, a number of initiatives are underway. Bradshaw and 
colleagues at the University of York are comparing child benefit packages 
across 22 countries, including all EU countries. An EU COST Action 19 
programme on child wellbeing started in 2001, and Bradshaw and colleagues 
have also established a cross national database of economic indicators of 
child poverty for the multinational project on measuring and monitoring 
children’s well-being, available from Bradshaw at York University (www.user- 
users.york.ac.uk/~irb1/current.htm ). 

In Chapter 2 we set out our basic model. This identifies household resources, 
particularly income, broadly defined and it distinguishes between 
transmissions and processes broadly within the family and individual sphere, 
from neighbourhood and area characteristics. In Appendix 1, we consider 
some of the statistical issues raised by the model. In Chapter 3 we then look 
at the possible data requirements and mechanisms in more detail and 
evaluate how far the available data sources include the appropriate data. 
Appendix 2 sets out, in tabular form, brief descriptions of these datasets. In 
Chapter 4 we look at neighbourhood effects and data on service quality. 
Appendix 3 considers some of the technical issues. Chapter 5 explores both 
the technical questions of data linkage and the ethical and legal constraints. 
Chapter 6 makes outline recommendations for future work! 
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CHAPTER 2 - MODELS AND DATA 



2.1 Underlying model 



The pathways by which child poverty can lead to poor child outcomes - or, 
more generally, how family income explains child outcomes - are set out in 
Fig. 2.1. The model represented by Fig. 2.1 underpins this paper. We 
elaborate this model here and draw out some of its implications. In Chapters 3 
and 4, we look at the model's components in terms of measurement and data. 
In Chapter 6, we recombine these components to show what could be done at 
present to estimate the model, what would be possible with new analyses of 
existing data, and what new data are needed in order to obtain a more 
complete picture of all the pathways. 

Fig. 2. 1 Links between child poverty and child outcomes 




Note 

The arrowed lines represent associations that are potentially causal; the 
curved line represents an association that it is not causal. 



The outcome variables ('child outcomes') appear on the right hand side of Fig. 
2.1 . The key explanatory variable in the model is family (or household) 
income, not so much its level at a particular point in time but more how it 
changes from period to period, and not only its representation as a sum of 
money but also its wider meaning in terms of command over resources over 



time. We have chosen to focus on income in this report, and therefore 
variables like employment status and parental educational qualifications are 
hidden from view, essentially to the left of income in Fig. 2.1 . These variables 
may, as it were, partly determine the levels of family income (we elaborate on 
this point below). Chapter 3 sets out in detail what outcomes need to be , 
considered, and also describes the measurement of income and income 
dynamics. 

There can be direct links between income and child outcomes. For example, 
an increase in family income can mean that parents are able to afford to have 
their child at school beyond compulsory leaving age. Most of the links, 
however, are indirect. There are groups of intervening variables that can 
provide a better understanding of just how income influences outcomes or, to 
put this another way, variables that can help us to understand what the 
processes might be that lie behind the well-established association between 
income and child outcomes. The upper part of Fig. 2.1 represents processes 
within the family that can mediate, or transmit these influences: the lower part 
represents characteristics of the area in which the child lives that might also 
have an effect. 

The model in Fig. 2.1 is, like all models, a simplification. For example, the box 
labelled ‘processes within the family’ contains a range of variables, some of 
which could themselves be linked in an explanatory framework. One instance 
of a chain of influences might be as follows (where the influences are ordered 
chronologically and 'U' represents a link that is potentially causal): 

fall in income 

increased stresses and strains between parents 

■ ■ jj 

less time available to spend with their Child on educational activities 

U 

child does less well at school 

U 

increased risk of child's involvement in crime. 

In this example, there are two processes within the family: the parental 
relationship ('stresses and strains') and the time given to the child for 
educational activities. Also, we find that educational attainment is an 
intermediate outcome, part of a process that relates to the risk of crime. But, 
in other cases, educational attainment would be the final outcome, as follows: 

rise in income 

U 

better diet 

U 

improved child health 

U 

more success at school 
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Here, health is an intermediate outcome and educational attainment the final 
outcome. 

It is important to recognise that there can be ambiguity about what process 
comes first in a chain. For example, a fall in income could lead to less 
spending on a child’s education, lower educational attainment and hence 
poorer mental health for the child, but it would also be possible for poorer 
mental health (resulting from the loss of a peer group perhaps) to precede 
lower attainment. 

Another simplification in Fig. 2.1 is that it does not allow for 'feedback' 
mechanisms. It is, for example, possible for a rise in income to lead to a child 
staying on at school, in turn creating a more harmonious family situation and 
hence other positive outcomes for the child/young person. Models that allow 
for feedback - sometimes known as non-recursive models - can be difficult to 
estimate statistically. It is beyond the scope of this report to go into these 
issues in any detail but, in Appendix 1 , we briefly describe the statistical 
approaches that could be applied. 

There can also be some doubt about whether changes in income are always 
'exogeneous' with respect to certain child outcomes. For example, long- 
standing illness or disability in a child - a candidate for an outcome variable - 
can lead to (rather than be caused by) reduced family income if a parent has 
to give up work to care for the child and does not receive a compensating 
benefit. 

The selection of 'process’ variables also needs to be made with care. We 
argue that they should not, for example, include variables like educational 
qualifications and employment status which, although associated with income, 
are hot necessarily determined by it. Unemployed or unqualified parents are 
generally poorer - because they are unemployed or unqualified or both. If we 
are interested in the effect of income on child outcomes, it makes little sense 
to dilute that effect by including employment status in the model either as an 
intervening variable or as a control variable that is correlated with income. 

Single parent status is a particularly difficult variable to locate in the model. It 
can describe a mother's position at the time the child is born or her position 
arising from the breakdown of the parental relationship after birth. This 
breakdown could have been precipitated by a fall in family income and could, 
in turn, lead to a further fall in family income as a result of a separation or 
divorce. We might, however, want to consider estimating separate models for 
two and single parent families because the processes linking income to 
outcomes might be different for these two groups (and for other social groups 
as well). There is, for example, some evidence from research in the United 
States that improvements in family income for previously single parents, as a 
result of moving into a new partnership, do not necessarily lead to uniformly 
better outcomes for children (McLanahan, 1997). 

We recognise that it is not always easy to ascribe changes in child outcomes 
with confidence to changes in income. This is especially so when most of the 
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available data are observational rather than experimental. (We discuss the 
potential for an experimental approach in Chapter 6.) Although we argue for 
the importance of changes in income that arise from changes in, say, 
employment status, others might reasonably argue that employment status 
itself is the fundamental explanatory variable and the resulting change in 
income is merely an intervening variable. 

Assertions about the importance of income on child outcomes cannot escape 
from the possibility that there are essentially unobservable characteristics of 
parents - sometimes referred to as 'endowments' - that affect both family 
income and child outcomes. To the extent that these are fixed then their 
effects can often be eliminated by examining the effects of changes in income 
on changes in child outcomes (a point we return to in Section 3. 3). Another 
way of controlling for them to some degree is to compare outcomes for 
siblings within the same family, subject to the same parental endowments but 
possibly different family and institutional environments. 

Turning to the lower part of Fig. 2.1, there will usually be an association 
(shown by the curve joining the two boxes) between family income and area 
characteristics. On average, although certainly not exclusively, poor families 
live in disadvantaged areas. It is widely believed (although the evidence base 
for this belief is not strong) that family influences on child outcomes are 
stronger than area influences (McCulloch and Joshi, 2001b). Hence, the 
important question is whether area characteristics have an influence on child 
outcomes having allowed, or statistically controlled for family characteristics 
and parenting behaviours. Allied to this question is one about the effect 
service quality has on outcomes - can a good local school, for example, 
mitigate the effects of poverty and a run-down neighbourhood? Some studies 
(e.g. Mortimore et al., 1988 [pp 214-216]), suggest that ‘effective schools’ may 
in part be able to do this. We return to these issues in Chapter 4. 

We should note at this point that our remit was the relatively narrow one of 
considering the links between child poverty and child outcomes, not the broad 
issue of considering all the ways in which family characteristics and parenting 
behaviours might be associated with child outcomes. In other words, our 
model - represented by Fig. 2.1 - aims to set out the links between income 
and outcomes, rather than all the links that might account for variability in 
these child outcomes, not all of which will be related to income. 

A time line - from left to right - is implicit in Fig. 2.1 . This reinforces our view 
that longitudinal data - with their focus on individual change - are likely to be 
much more useful than cross-sectional data - which focus on levels at a point 
in time - for estimating any model taking this general form. We do not, 
however, usually know what time lags to expect - how quickly does a rise in 
income, perhaps induced by a change in policy, lead to, say, a more stable 
family environment and how quickly is that transmitted into higher educational 
attainments for children? And time lags for effects emanating from rises in 
income might be different from those emanating from reductions in income. 
Indeed, there is no reason to assume that the effects of changes in income 
will be symmetrical in the sense that a rise in income might have a stronger 
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effect on an outcome than a fall of the same magnitude, especially if the 
effects of the fall can be mitigated by drawing on savings accrued during the 
period of higher income. 

It would also be possible to extend Fig. 2.1 - essentially horizontally to the 
right - to capture inter-generational effects. This would then represent possible 
'cycles of disadvantage' (or advantage), whereby grandparents' low income 
leads to poor outcomes and hence low incomes for the parent and this, in 
turn, leads to poor outcomes for the child. 

We have set out a model in this section that will guide us through the various 
data and analysis issues in subsequent chapters. Our treatment of this topic is 
necessarily rather brief and schematic. It is also rather general; more specific 
research questions would lead to refinements. This is, to some extent, brought 
out in Chapter 3 where the income, process and outcome variables are 
discussed in a more explicit way. We have, nevertheless, discussed some of 
the implications, limitations and possible elaborations of a model of this kind. 



2.2 Types of data 

We have reviewed a wide range of data types, set out below; 

2.2.1 Cross-sectional survey data 

This type of data provides a snapshot of information about individuals at a 
particular time point and, if the surveys are repeated, a series of snapshots of 
different individuals at different points in time. Although, with cross-sectional 
data, it is not possible to trace individuals over time, it is clearly possible to 
make comparisons over time at an area level with repeated cross-sectional 
data, provided that area boundaries (and questions asked) remain consistent. 
Cross-sectional studies would not normally provide the type of data needed to 
study relationships of the type set out in Fig. 2.1. They might, however, be 
part of a process whereby a suitable group was identified and ‘screened’ for 
more detailed study. 

2.2.2 Longitudinal surveys and panel studies 

Surveys and studies in this category are designed so that changes at an 
individual level can be tracked, thereby meeting a major weakness in cross- 
sectional studies. An issue with the major longitudinal birth cohort studies is 
the relative infrequency of measurement, restricting for example the analysis 
of income dynamics. Panel studies that typically have a shorter cycle (often 
annual or biannual) and follow a representative sample of panel members in 
households, including to the new household units they may subsequently form 
or join, can provide the necessary frequency of measurement to study such 
dynamics. The weakness in such studies might be the relatively small number 
of children, particularly of those growing up in poverty or in different ethnic 
groups, unless this was tackled by disproportionate sampling of such groups 
for this purpose. 
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2.2.3 Administrative data 



Administrative data are principally those collected by central or local 
government for administrative purposes rather than for research. Such data 
are increasingly being made available for analysis by researchers outside 
government. The data can be divided into ‘event based information’ (e.g. 
registration of birth or death, examination results) or some form of continuous 
record that can be sampled at a specific point in time (e.g. DSSIDWP benefit 
claims). 

Administrative data can be analysed both cross-sectionally and longitudinally. 
Thus it has been possible to string together benefit datasets by using 
individual National Insurance Numbers fMINOs). Much of the administrative 
data (e.g. Child Benefit, Income Support) contain a very specific set of 
information and so the amount of background information is limited. However 
it is, in many cases, virtually a census of children or total population in a 
specific category. As well as being 100% comprehensive geographically and 
therefore ideal for small area or individual-level analysis, it also hes the 
advantages of being non-intrusive for the people involved (subject to data 
protection issues being met) and cost-efficient as it is already routinely 
collected. A weakness of benefits data is that they are restricted to those 
claiming - not, in general, the same as those eligible for that particular benefit 
unless take-up rates are very close to 100%. However, it is also the case that 
all survey data suffers from response rate problems, and this may be more 
pronounced for those living in poverty or in disadvantaged areas. 

2.2.4. Service quality data 

Data are now increasingly routinely collected on the quality of services (e.g. 
OFSTED reports for schools) and in some cases in numerical format (e.g. 
rating scales or quantitative process or outcome measures). Thus the 
OFSTED database contains not just school reports, but also numerical scales 
recorded during the inspection at school, subject and class level. The data 
also includes some ‘value added’ information in the PICSI (Pre inspection 
school and social context reports) and PANDA (Performance and 
Assessment) systems which will be extended as more linked individual pupil 
performance data become available following the introduction of the unique 
pupil numbering system. The OFSTED database also includes details on all 
institutions providing state supported education for three and four year olds, 
and will, from 2002, include a register of all childcare provision with quality 
assessment data. Service quality data might serve as an important and low 
cost supplement to individual survey data to complement consumers' views 
on service quality. 
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2.3 Criteria for including surveys and administrative data 



The criteria we have used to identify surveys and administrative data for 
inclusion in this report are that: 

(i) it contains individual data on income (from all sources), or, at the 
very least, valid income proxies, ideally enabling the analysis of 
income dynamics. 

(ii) it contains individual data on one or more key child outcomes 
(as defined in Chapter 3), or scope for data linkage to add such 
data from other sources preferably at an individual level by, for 
example, accessing educational performance data at an 
individual level and linking this to survey data. 

(iii) there is some possibility of geocoding these individual data to 
link them to a ‘neighbourhood’ or proxy for such, ideally by 
address or postcode. 

(iv) it contains data on transmission mechanisms or processes (as 
defined in Chapter 3). 

(v) there is the possibility of linking individual data to data on the 
accessibility of services and their quality. 

We have not come across any data sources entirely satisfying all five criteria. 
We have, therefore, applied a more inclusive criterion and have considered 
studies or datasets that meet at least conditions (i) and (ii), and if only (i) and 
(ii) then there should be some data on income dynamics as well as on levels. 

The datasets we have reviewed are listed and grouped in tabular form in 
Appendix 2. Please refer to this Appendix for further details of studies referred 
in the main text. 

Generally, as suggested earlier, the most useful data sources are those that 
are longitudinal, and include children in their samples. There are a number of 
cross-sectional surveys collecting data only about adults (usually 16 and over) 
- such as the General Household Survey (GHS) and the Labour Force Survey 
(LFS) - that are extremely limited in terms of providing data to answer the 
central questions posed by this study, though they may provide the source of 
‘screening’ data for subsequent more focussed surveys. The large sample for 
the LFS does, however, mean that it might be used to provide area data on, 
for example, qualifications, health, working patterns as well as income at a 
reasonably low (at least local district) level of aggregation, especially if data 
from successive years were amalgamated. There are typically about 17,000 
cases aged under 25 in each year’s LFS data covering the whole of the UK. 
The LFS, because of its sampling procedure, has a short longitudinal element 
as cases are retained over one year. This has ingeniously been used to study 



short-range dynamics in income following unemployment (Gregg and 
Wadsworth, 2000). 

In this chapter we have set out the broad research and data requirements 
needed to link child poverty and child outcomes. We now move on to consider 
the measurement of the various parts of the proposed model, and to describe 
studies and datasets that could be used to estimate it. 
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CHAPTER 3 - INCOME AND RESOURCES, PROCESSES AND OUTCOMES 



3.1 Introduction 



In this chapter, we list the concepts and variables that need to be measured. 
We do this first for household income and resources, then for child outcomes, 
and finally for the intervening processes that may link growing up in poverty 
with subsequent development. 

In Appendix 2, we review in summary form the data that are available (or may 
become available) in order to estimate the strengths of the pathways set out 
earlier (Fig. 2.1). The topic of neighbourhood effects and services is covered 
in Chapter 4. 



3.2 Income and resources 



In order to examine the outcomes for children growing up in poverty we must 
have good measures of how severe and prolonged such poverty has been. A 
number of points need to be made before we simply equate childhood poverty 
with household income. 

First (without going into an extended discussion of the measurement of 
poverty), there are many ways that poverty experienced by children might be 
measured. Increasingly such poverty has come to be measured in terms of 
relative income - that is, equivalised household income falling below a point 
such as 50% or 60% of the median income. This is the measure used in the 
Households Below Average Income (HBAI) series, now based on the annual 
Family Resources Survey (FRS), where the series is presented both before 
and after adjusting for housing costs (BHC, AHC). Similar measures are used 
in cross-national statistics (e.g. Bradbury and Jantti, 1999). These measures 
clearly fit the league table or monitoring function approach where such data 
can be compared over time in the same country or cross-nationally, using 
either the threshold point at a fixed time period or updating it to the current 
year. However, this way of measuring poverty might not necessarily fit so 
easily with the different requirements of unravelling the links between children 
growing up in poverty and subsequent outcomes, where the threshold of 50% 
or 60% of the median income may or may not be so significant a cut-point. We 
raise this not to propose an alternative but simply to point out the step from 
‘child poverty’ to a particular threshold point on the overall income distribution. 

Second, it is increasingly clear that if we are to take ‘command over resources 
over time’ seriously, then some form of repeated income measure or ‘income 
dynamics’ data are required. The frequency of this information is also critical. 
Thus, recent studies using UK data becoming available (e.g. Hill and Jenkins, 
2001), including administrative data (Platt, forthcoming), confirm the picture 
from earlier US studies that more frequent data extracts show much more 
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‘mobility’ than might have been expected, even though this mobility may be 
short range and may vary for different groups and areas. 

The requirements for systematic household income data collected in a 
standardised way at repeated time points, ideally on an annual basis, rules 
out all but a very few studies. Probably only the British Household Panel 
Survey (BH PS) currently meets such a requirement over an extended time 
period, and covering the whole of the income distribution. 

Third, to establish household income and resources requires information on 
other household members, to calculate ‘equivalence’ income scales, as well 
as data on housing costs for AHC estimates. It requires a very heavy battery 
of questions on income, benefits received and household assets, which 
compete with other topics for space in surveys. Table 3.1 sets out some of 
these elements in more detail. Those concerned to measure other aspects of 
child outcomes are often reluctant to devote more than a limited number of 
questions to this area (not least because heavy questioning on income is 
thought likely to reduce overall survey response rates). Alternative 
approaches that might be considered would include the approach to 
measuring poverty that uses a lack of socially perceived ‘necessities’ in 
addition to direct measures of income. This approach, pioneered in the Mack 
and Lansley (1983) study Poor Britain, later updated as Breadline Britain 
(Gordon and Pantazis, 1997) has now been brought up to date in the national 
survey Poverty and Social Exclusion in Britain (Gordon et al., 2000). This 
latest study used the ONS Omnibus Survey to establish, on a national 
sample, the consensus on what items were perceived as ‘necessary’ and 
used a follow-up sample from the GHS to establish which households 
possessed these items, and whether this was by choice or for financial 
reasons. Thirty items were identified as socially perceived necessities for 
children and the proportion of children lacking these items was assessed. The 
survey also included a number of other ways of measuring poverty and social 
exclusion, including a subjective measure. 

Given the high cost of the full battery of questions to assess household 
income it may be worth examining studies of this type for effective proxies for 
the full income measures, rather than simply use a limited number of 
questions on income (e.g. income bands) that may not provide comparable 
and reliable data. 

Fourth, in considering ‘child poverty’ we should raise the difficult question of 
intra-household income transfers. In principle, we need to know something 
about how effectively household income reaches children. This may raise 
long-running questions about which adults are formally paid the 
income/benefits intended for their children, but also something about direct 
expenditure on children. It may well be that poor parents spend 
disproportionately more on their children than they receive in benefits. 
Witherspoon et al. (1996) showed, in terms of non-dependant deductions for 
housing benefit, that some parents shielded their non-dependant young from 
the full rent contribution. It would be important to throw light on some of these 
areas if we are to trace income effects on child outcomes. For obvious 



reasons many of the studies that deal with intra-household income and 
expenditure are small qualitative studies, rather than large scale surveys. 



Table 3. 1: Some requirements for measuring household income 



Levels 

Weekly/monthly/annual wage/salary for each earner in household; 

Other income - investment, private means, 'informal economy'; 

Means-tested and other benefits (JSA, Income Support, Housing and Council 
Tax benefits etc.); 

Disability related benefits (IB. DLA). 

'Wealth' 

Level of savings/assets; 

Level of debt. 

IncomelWealth proxies 
Car ownership; 

Tenure: own house or mortgage; 

Council Tax band on house; 

Number of consumer durables, essential household items. 

Housing Costs 



Other: 

'Shock absorbers' - insurance, savings, parental contributions etc. 

Data on disability and other needs that may require higher income for the 
same standard of living. 



3.3 Income variability over time 



There are a number of different ways in which income can vary over time, as 
illustrated by the four panels in Fig. 3.1. Periodic measurement, ideally on an 
annual basis, is needed to distinguish between the different patterns. These 
patterns could have different implications for child outcomes. Thus, families 
falling into the type represented by panel (a) move in and out of poverty rather 
frequently, perhaps as a result of unstable employment patterns. Their 
children's outcomes might or might not be better than those families in panel 
(d) which stay permanently in poverty. In many ways, the most 'useful' 
patterns for research investigations are those shown in panels (b) and (c) as 
they describe situations of real change, either improvements or declines. If 
there is indeed an explanatory link between income and child outcomes then 
the expectation would be that outcomes for children in families in panel (b) 
would change for the better, and perhaps be as good as those for children in 
families in the top group in panel (d). Children in families subject to 
misfortunes that place them in panel (c) would have worsening outcomes, 
perhaps worse than those in panel (a) and as bad as those in the bottom 
group in panel (d). 
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Fig. 3.1: Income dynamics 








(a) 



(b) 



(c) 

Time 



It is beyond the scope of this report to go into detaii about how to measure 
income (and poverty) dynamically, to take account both of transitions between 
income ievels and also of durations in these different ievels. The report on a 
coilection of US studies edited by Duncan and Brooks-Gunn (1997) uses a 
three-way classification - always, sometimes and never poor - to measure 
income dynamics. Although considerably more useful than a single cross- 
sectional measure (poor/not poor), such a measure is somewhat limited in 
that it is not explicitly based on change, and hence ignores both the direction 
and magnitude of any changes. As we have just pointed out (and as we 
discussed in Chapter 2), we are likely to get a more complete understanding 
of the links between child poverty and child outcomes by analysing the effects 
of changes in income on outcomes. Consequently, an analysis based on 
income change, and taking into account both the direction of change and the 
possibility that the effects of change vary according to initial income levels, is 
likely to be more informative. 



In addition to survey data, administrative data extracts can provide some 
handle on variability among low-income households. Linking WFTC to IS/JSA- 
IB.data at an individual claimant level would form a major dataset for families 
with children as WFTC goes some substantial way up the income distribution. 
The possibility of linking such administrative data is reviewed in Chapter 5. 

We could consider using changes in income proxies as proxies for changes in 
income. However, housing tenure or car ownership, for example, might be 
much 'stickier', especially in terms of downward movements, than income. 
Social mobility - intra-generational changes in social class - will also be 
correlated with income changes but, again, people can be socially mobile 
without changes in their income, and can remain in the same social class (or 
socio-economic group) whilst experiencing substantial changes in income. 



3.4 Child outcomes 



We have, for the purposes of this paper, adopted the following definition of a 
'child': 

• all children and young persons up to their twentieth birthday. 



Hence, anything that happens to them up to the age of 20 is potentially an 
‘outcome’. This includes, for the great majority, applications and entries to 
Higher Education (HE) but not HE outcomes, and excludes some early labour 
market experiences. Given the increasingly extended period of transition from 
dependent to independent adult status, there are arguments for extending the 
age limit upwards to 25 years, when the benefit system treats a claimant as 
eligible for the full adult rate. However it is probably better to see the process 
as a series of different transitions that take now place from the middle ‘teens 
to the late 20s - from school or education into employment, in housing or in 
family formation. If we wanted to encompass all this we would have to extend 
until this later stage of family formation and parenthood, now postponed for 
many to their late 20s or early 30s. 

The list of child outcomes is potentially very long. We have focused on 
'objective' outcomes which we have grouped into four ‘domains’: education, 
health and psychological outcomes, crime, and a miscellaneous group of 
outcomes in late adolescence, many associated with transition to adulthood 
and independence. We do, however, consider some 'subjective' and 
attitudinal measures. The distinctions - and overlaps - between outcome 
variables and the process variables described in Chapter 2 need to be borne 
in mind. It is beyond the scope of this paper to go into detail about specific 
instruments for measuring the outcomes. 

3.4.1 Educational outcomes 

These can be grouped into attainments, behaviours, and attitudes. 

(a) Attainment: 

Baseline scores at entry to primary school around the age of 5 - these 
should be available in a standardised form by 2003. 

National Curriculum tests in English, Mathematics and Science - the 
SATs-at ages 7,11 and 14. 

Examination results at age 16 (GCSE) and ages 17 and 18 (A/S and A 
levels). 

Standardised tests of reading, mathematics etc., collected for research 
purposes. 

It is important to remember that assessment arrangements in Scotland differ 
in important respects from those in England, Wales and Northern Ireland. 

Also, the range of post-16 qualifications continues to expand. 

(b) Behaviours: 

• Staying on at school or in Further Education beyond age 16. 

• Application to HE. 

• Entry to HE. 

• Entry into training. 

• Truancy/absenteeism. 
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Exclusion from school (temporary and permanent). 
Special Educational Needs (5 stages). 



(c) Attitudes: 

• Educational aspirations (especially at younger ages). 

• Attitudes towards the value of lifelong learning. 

3.4.2 Health and psychological outcomes 

These can be grouped - somewhat arbitrarily - into states, behaviours, and 
beliefs and perceptions. 

(a) States: 

• Acute and chronic morbidity (both physical and mental and to include 
rare conditions such as 'neglect'). 

• Disabilities. 

• Accidents (both within and outside the home). 

• Height and weight (including obesity). 

• Dental health. 

• Levels of fitness. 

(b) Behaviours: 

• Exercise and sporting activity. 

• Diet and eating habits. 

• Use of tobacco. 

• Use of alcohol. 

• Use of drugs. 

• Sexual behaviour in adolescence. 

(c) Beliefs and perceptions: 

• Determinants of good health. 

• About oneself - self esteem . 

• Behaviour problems (as perceived by parents and by teachers). 

3.4.3 Crime outcomes 

• Cautions. 

• Convictions, fines, and incarceration. 

• Attitudes towards to the law, and towards illegal acts. 

3.4.4 Miscellaneous 

• Early/teen and lone parenting. 

• Homelessness. 
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Early labour market experiences. 






3.5 Intervening (transmission, process) variables 

These can be grouped into the domains used for the child outcomes in section 
(3.4). In addition, there are some variables that apply across the domains. It is 
important to bear in mind the points made in Chapter 2 about not treating 
variables that could determine income as intervening variables. There is an 
element of doubt about their status as intervening variables of a few of the 
variables listed below, and these are indicated with 

(a) Intervening variables for education 

Educational inputs and activities at home and from the wider family: 

• reading to, and hearing children read; 

• helping with homework; 

• using the Internet for homework etc.; 

• attending parents' meetings etc.; 

• number of books in the home; 

• educational visits. 

(b) Intervening variables for health and psychological outcomes 

• parent child interactions; 

• diet/nutrition; 

• parents' health behaviours - smoking, alcohol abuse*, drug use*; 

• home safety precautions. 

(c) Intervening variables for crime outcomes 

• parental involvement with*, and attitudes towards crime; 

(d) Intervening variables for miscellaneous outcomes 

• parental control; 

• advice about sex, contraception etc. 

(e) Cross-cutting variables 

• housing stress and physical quality - persons per room, damp etc.; 

• family size and composition (but not family type); 

• parental control - setting boundaries etc.; 

• how income is used. 
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3.6 Datasets 



Appendix 2 sets out the studies we have reviewed in terms of how well they 
cover the different elements discussed in this chapter. We have grouped them 
into categories or types - those studies that we believe are or will be useful for 
analysing the links between child poverty and child outcomes, sub-divided into 
surveys, evaluation studies and aggregate datasets; those that could make a 
contribution, albeit limited: and those that we considered but decided had little 
to offer in this particular area of investigation, valuable as they are for other 
questions. We also consider how well the different studies match up to the 
requirements for research in this area and, to a degree, their relevance for 
policy. 
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CHAPTER 4 - EFFECTS OF NEIGHBOURHOOD AND SERVICES 
4.1 Introduction 



The model set out in Chapter 2 (Fig. 2.1) highlights the possibility that where a 
child lives (or used to live) has an effect on their outcomes, over and above 
the circumstances in which they grow up at home. It might be the case, for 
example, that a child's life chances are improved if they grow up in a poor 
family who happen to live in an advantaged area or neighbourhood, or 
diminished if they grow up in a rich family living in a disadvantaged area. 
There could also be a 'double penalty' of growing up poor in a disadvantaged 
area and a 'double reward' of growing up rich in an advantaged area. More 
detailed discussion of these issues, and the social theories behind them, can 
be found in Sampson et al. (1999) and McCulloch and Joshi (2001b). Similar 
questions have been widely debated in the health care field (see, for example, 
Sloggett and Joshi, 1994), and underpin the debate about 'underclass' and 
‘social exclusion' (Crane, 1991; Glennerster etal., 1999; Lupton, 2001). 

As well as questions about measurement and analysis, hypotheses of this 
kind raise a number of conceptual questions. Perhaps the most fundamental 
of these is what we mean by 'neighbourhood' and how closely the usual ways 
of defining neighbourhoods spatially relate to what individuals themselves 
regard as their 'neighbourhood', which may increasingly be influenced by the 
importance of non-spatial networks. With technological advances, it is now 
very common to define neighbourhoods by administrative boundaries - for 
example, wards and postcode sectors in the UK, Census tracts and ZIP codes 
in the US. This has the considerable advantage of defining a collection of 
mutually exclusive and exhaustive groups but at the expense of a loss of 
construct validity in terms of the contextual effects we would like to capture. (A 
useful discussion of these issues is given in Hinds et al., 2000.) Construct 
validity will be reduced first, if there is variability within individuals about what 
they regard as their neighbourhood, which could depend on the domain of 
interest (services, shops etc.), and on the ages of their children. Second, 
individuals within a ward, say, are likely to vary in their views about what they 
perceive to be their neighbourhood. Ideas and perceptions about 
neighbourhood and 'community' may well differ between urban and rural 
areas - or indeed by ethnic group. Actual behaviour may vary as well. For 
example, distance travelled to secondary school by pupils of Bangladeshi 
origin in one major conurbation was approximately half that travelled by white 
pupils. African Caribbean pupils typically travelled 30% further than white 
pupils - only 17% attended the physically nearest secondary school, whereas 
nearly 60% of Bangladeshis attended their physically nearest secondary 
school (Smith et al., 1999). 

There are other possibilities for defining neighbourhoods. Travel to Work 
Areas (TTWAs) have been constructed from commuting flows for the 1981 
and 1991 Censuses to define local labour markets. They therefore have some 
face validity but are generally too large to serve as local neighbourhoods. 

Thus London is treated as effectively two TTWAs. Primary school catchment 
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areas are another possibility but they are not necessarily mutually exclusive. 
Census Enumeration Districts (EDs) - Census Output Areas (OAs) in 2001 - 
are perhaps too small, though they could in principle be the building blocks for 
larger areas. Local Authority Districts are almost certainly too large. One way 
round at least some of the difficulties highlighted here might be to use 
boundaries defined by tenants' and residents' associations, but these 
organisations are far from exhaustive, and themselves often have overlapping 
and varied catchment areas. 

As well as the conceptual problems that have to be faced, there are also 
important practical matters to consider. In particular, neither wards nor 
postcodes have boundaries which are fixed in stone. Instead, the Electoral 
Commissioners regularly change ward boundaries to reflect changes in 
populations, eliminating some wards and creating new ones. The Post Office 
is driven by the workloads of postal workers, not the concerns of social 
scientists. The basic household postcoding system (the full seven character 
code) may be changed or reissued after a period of time. Many of the 
boundaries that are set for other services (health, police etc.) are not to the 
same areas. The PAT18 report on Better Information (Social Exclusion Unit, 
2000) grappled with this critical issue (in Annex G) and came up with a 
number of recommendations about setting consistent and centrally registered 
changes in boundaries. At present, only something like the Ordnance Survey 
national grid (eastings and northings) remains consistently the same and is 
included in the postcode directories for each postcode centroid. The position 
is, however, rapidly improving, as are the technologies using various 
Geographic Information Systems (GIS) to link information collected to different 
boundaries (e.g. from census to census where district, ward and ED 
boundaries may all be different). 

As we will often be interested in the effects of changes in neighbourhoods 
('neighbourhood dynamics') on outcomes, it is important to be able to 
separate genuine changes in the local context from changes induced by 
boundary alterations. This could prove to be a very complex exercise in a 
national level dataset. 



4.2 Measurement issues 



There are a number of ways of measuring area characteristics: 

1 . With aggregate administrative data; 

2. With modelled survey data; 

3. With aggregate survey data; 

4. By aggregating individual survey responses; 

5. By different kinds of systematic observation. 

4.2.1 Aggregate administrative data 



The amount of information available from administrative sources has 
increased very rapidly in the past two years. Typically these are very large. 



intensively postcoded (better than 99%) extracts from central record systems 
(many from the DWP which covers GB and may also process the data for the 
equivalent department in Northern Ireland). Thus, the Income Support/Income 
Based Job Seeker’s Allowance (IS/JSA-IB) datasets contain several million 
cases and can be used to indicate the local prevalence of, for example, 
people in receipt of these major means tested benefits. Similarly, data from 
the JUVOS system is used to provide the monthly claimant based 
unemployment counts at ward level. The claimant data may be aggregated to 
1991 ward boundaries, as in the NOMIS system. However, access to 
individual-level records also allows the data to be aggregated to other 
geographies. For example, the Oxford Index team recast the unemployment 
count to 1998 ward boundaries (which fitted to unitary authorities in England 
in the late 1990s when the Indices of Deprivation was constructed). 

Thirty three indicators covering six ‘deprivation domains’ were used to 
construct the English Indices of Deprivation 2000 (ID 2000) for every ward in 
England (Noble, Smith et al., 2000a). The ID 2000 and some of the 
administrative data used for its construction are now available through the 
ONS neighbourhood statistics website. The Welsh Index of Multiple 
Deprivation 2000 measures six domains of deprivation at the Electoral 
Division level (Noble, Smith et al., 2000b). The Measures of Deprivation for 
Northern Ireland (Noble, Smith et al., 2001) contains new data such as 
prescriptions for depression or anxiety used to measure mental health at the 
local level; crime data; and individual-level education performance data for all 
Northern Ireland school leavers over a three year period. This index is 
available at ward level for all wards in Northern Ireland to 1991 boundaries. In 
addition, for Northern Ireland, three Enumeration District measures have been 
produced: Income Deprivation, Employment Deprivation, and Economic 
Deprivation, the latter an equally weighted combination of the Income and 
Employment measures. These provide information about pockets of 
deprivation within wards. Altogether, these measures provide a rich backdrop 
of information about the area of residence. 

4.2.2 Modelling survey data 

There are several ways that large-scale national surveys have been used to 
construct small area estimates. At one level this can simply be through 
aggregating successive cross-sectional survey data. For example, Berthoud 
(2001) aggregated two years of Family Resources Survey (FRS) data to 
generate an estimate of household income at postcode sector level. 

‘Modelling down’ techniques on national survey data to provide estimates for 
different types of areas (e.g. working class estates in the north west region) 
have been used with attitudes surveys to give estimates of local perceptions 
in the ‘Geography of Misery’ series (Burrows and Rhodes, 1998). However, 
the most developed work in this field is now being undertaken by the Small 
Area Estimation Programme in the Methods and Quality Division at ONS. This 
programme involves joint work with seven National Statistical Institutes 
(EURAREA) to develop the theory, methods and application. For more 
information, see: 

www.statistics.gov.uk/nsbase/methods_quality/eurarea/default.asp. 
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Heady and Hennell (2000) illustrate the way these techniques work to derive 
small area estimates of income using data from the FRS. Other possible small 
area (usually ward, sometimes local district) estimates for unemployment, 
children’s mental disorders, and variables of interest to the ONS 
Neighbourhood Statistics initiative are also being explored within this group. 

4.2.3 Aggregate survey data 

Probably the best source of aggregate survey data is provided by the Census 
in that it covers the whole of the UK in detail. Census data are, however, 
limited in scope (by excluding any questions about income, for example) and 
quickly become out of date. The currently available data (from the 1991 
Census) are likely now to be somewhat inaccurate, but the situation should 
improve by 2003 as the Small Area Statistics generated by the 2001 Census 
become available. Cther large, repeated surveys - most notably the Labour 
Force Survey - offer the opportunity to generate some aggregate data at the 
local level, including income data, by combining surveys from a number of 
years (see Section 2.3). 

There are, however, dangers in using any kind of aggregate data (for a ward, 
say) to represent all families in a ward. It is, for example, possible that the 
neighbourhood characteristics of those families living close to the boundaries 
of a ward are better represented by the aggregate of the adjacent ward. 

The balance of the research evidence - most of it from the US - suggests that 
the best way of estimating neighbourhood effects is to measure variables that 
have face validity as predictors of child outcomes. So, for example, a measure 
of air pollution might predict child health but could hardly be expected to be a 
predictor of crime whereas a measure of social cohesion might predict crime 
and not child health. 

4.2.4 Aggregating individual survey data 

One study that has done a lot to further the measurement and analysis of 
neighbourhood effects is the Project on Human Development in Chicago 
Neighborhoods (PHDCN). This is a major interdisciplinary study aimed 
at deepening society's understanding of the causes and pathways of juvenile 
delinquency, adult crime, substance abuse, and violence (for more 
information, go to http://phdcn.harvard.edu/). It is a longitudinal study with 
between 20 and 50 households selected from each of 343 Chicago areas 
(amalgamations of Census tracts). The more detailed measurement work was 
confined to 80 of these so-called neighbourhood clusters. Sampson et al. 
(1999) propose three scales that are potential predictors of child outcomes: 

i) ‘Intergenerational closure’ - are the adults and children in a 

community linked to one another? Essentially, this is whether the 
parents in a neighbourhood know their children's friends and the 
parents of these friends. 
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ii) ‘Reciprocated exchange’ - what is the intensity of interfamily and 
adult interaction with respect to childrearing? For example, do 
parents and others in the neighbourhood do favours for each other. 

iii) ‘Informal social control and mutual support’ - do residents intervene 
on behalf of children, both to support them and to act to reinforce 
limits in terms of their behaviour? 

Responses to the items that make up scales of this kind are then aggregated 
over individuals to create measures of the neighbourhood. In order for the 
measures to be both reliable and valid, samples within neighbourhoods need 
to be selected randomly and to be of a reasonable size, and there needs to be 
at least some degree of agreement between the responses of individuals 
within neighbourhoods. 

As well as interviewing local residents to measure perceptions of local areas, 
Raudenbush and Sampson (1999) discuss the advantages of what they call 
systematic social observation scales, obtained by observers travelling around 
areas, taking notes and using videos to measure the extent of social and 
physical disorder in terms of, for example, drug dealing, prostitution, 
uncollected rubbish and graffiti. These methods, and the associated statistical 
techniques needed to estimate reliabilities, have been labelled ‘ecometrics’ by 
Raudenbush and Sampson (1999). 

In the UK, measurement of local neighbourhoods has used network analysis 
(Mitchell, 1969), drawing on theories about reciprocity (Bulmer, 1986) and 
data from surveys of mutual aid provided by kin, friends and neighbours (e.g. 
Willmott ,1986; 1987; see also the 'social capital' module in the 2000 General 
Household Survey). 



4.3 Analysis issues 



There are several issues that any analysis of neighbourhood effects needs to 
address. The first is the one raised in Chapter 2 - given that family and local 
contexts are correlated, is it possible to obtain separate estimates of them? It 
seems most likely that impacts are mediated through different family and 
neighbourhood variables, as Rutter etal. (1998, [pp199 onwards]) argue in 
their review of poverty and social disadvantage. A related question is whether 
the direct effect of neighbourhood can be disentangled from the effects of 
services such as schools provided for that neighbourhood. If not, then there is 
a danger of ascribing effects to variables such as a concentration of poverty 
when really the effect can be explained by the quality of local schools (which 
might be correlated with area poverty rates). We discuss the question of 
service quality in (4.4). These two questions raise some technical issues 
which, along with others, are discussed in Appendix 3 and are summarised 
below. 

A third question is whether neighbourhood effects vary in importance across 
the range of child outcomes set out in Chapter 3. Linked to this is the 
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possibility that effects will vary in size according to the age of the child. It may 
be unlikely that there will be strong neighbourhood effects on children’s early 
school attainments (when family influences will still be strong) but plausible 
that these effects could be stronger in adolescence (when the influence of the 
family declines in importance and the influence of peer groups increases). 

This hypothesis is not, however, supported by McCulloch and Joshi (2001b). 

A fourth question relates to geographical mobility. Is it the neighbourhood of 
current residence that is most important for adolescent outcomes, or the 
neighbourhood experienced by the child when younger? Again, the answer 
could vary according to the outcome under consideration, possibly being 
different for crime than for health. One might also expect to observe ‘dose- 
response’ associations, with stronger effects for children as length of 
residence in the neighbourhood increases. 

Finally, in order to reach a fuller understanding of the effects of 
neighbourhoods on child outcomes, we need to see what happens to these 
outcomes as neighbourhoods change, or as the child moves from one 
neighbourhood to another. Are there positive effects on outcomes if a 
neighbourhood improves or if a child moves from a disadvantaged area to an 
advantaged one, even if the family’s own circumstances do not change? This 
implies that, as well as longitudinal data on children and their families, we also 
need longitudinal data on neighbourhoods. Administrative data collected over 
time are beginning to yield consistent information on small areas, and also 
something about the groups who move in or out of such areas, as they can be 
identified in national administrative data. Data from evaluation studies may 
throw light on the impact of changing key features, and whether these have 
variable effects in different areas or for different groups. 

Summarising Appendix 3, the crucial issue, when wishing to get good 
estimates of neighbourhood effects, is to collect and model data on outcomes 
and their correlates for individual children. Models based solely on ecological, 
or aggregate, data are misleading. Ideally, these individual data should be 
obtained from a clustered, or multilevel design so that between area variability 
can be separated from between individual within area variability, and both 
these sources of variation can then be modelled statistically. There will always 
be an element of doubt about the validity of neighbourhood effects because 
where people live is not the result of random distribution but a mix of choice, 
constraint or sometimes policy (to disperse particular groups) and these 
factors are likely to be related to outcomes for their children. This problem can 
be alleviated by including in a model a range of measures at the individual 
and family levels. 



4.4 Measures of quality of local services 



A further part of the jigsaw of identifying neighbourhood effects would be to 
incorporate some measures of access, both physical and psychological, to 
local services and the quality (and, by extension, service effectiveness) of 
those services as they affect children. 
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The ID 2000, Welsh Index of Multiple Deprivation 2000 and the Measures of 
Deprivation for Northern Ireland each contain a domain of deprivation entitled 
‘Geographical Access to Services’, which gives a ward/Electoral Division level 
score for people’s access to certain key services. It would, in principle, be 
possible to produce ‘child oriented’ access domains at a small area level that 
measured access to services that are most relevant to children of whatever 
selected age range (for example, the English ID 2000 has an indicator of 
access, measured by distance, to primary schools for 5-8 year olds). Some of 
the main issues that would need to be considered include selection of 
appropriate services; adequate measurements of distance (in the ID 2000 and 
the Welsh IMD 2000 access was measured ‘as the crow flies’ whereas it was 
possible in the Northern Ireland Measures of Deprivation to refine this to 
measure distance by road); availability and cost of public and private 
transport; and issues of cultural or physical accessibility eg for disabled 
people). 

There are clearly increasing amounts of data purporting to assess quality of 
service. Typically these are professional ratings or other indicators that focus 
on an institution - for example OFSTED reports on schools or preschool 
facilities. In certain cases there may be rating scales or performance data. 
League tables of schools - based on pupils’ performance in tests and exams - 
and hospitals - based on mortality data - purport to measure quality. However, 
their failure to adjust for the intake characteristics of pupils (in the case of 
schools) and the caseload of patients (in the case of hospitals) renders 
doubtful their value as indicators of quality. However, as noted in Chapter 2, 
there are moves to improve this aspect of school based assessment as linked 
individual pupil performance data become available. This issue is considered 
in some detail by Goldstein and Spiegelhalter (1996) and in the associated 
discussion - see also Goldstein, 2001). 

It would, however, be a major task to unbundle some of this information 
routinely to a local area, though as information builds up about usage and 
catchment area it might be technically possible to link this information on 
quality to local areas. In other cases it might be possible to link data on 
individuals directly to the institution to which they are attached. 

In cases where service use is more intermittent or perhaps periodic (e.g. 
hospital, dentist, day-care centre), the issue might be as much one of access 
as of quality. But, in principle, it should be easier to link such information to 
individual surveys to fill out individual-level data with some information on 
institutional quality (of, for example, school attended). However, this is far 
from routine and opens up questions of ethics and data protection (discussed 
in Chapter 5), if there is to be any direct data linkage at this level. 



4. 5 Data 

Notes on the main datasets are tabulated in Appendix 2 (see Section 3.6 for 
more details). At this point we simply list some relevant aspects of the major 
surveys that allow for the possibility of measuring ‘neighbourhood effects’. 
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ALSPAC (Avon Longitudinal Study of Parents and Children) 

Because this study initially included all births in a relatively small area, the 
sample is highly clustered by whatever aggregation is chosen. As it also 
includes data on primary schools attended and health services used, it would 
be possible to separate out different local contextual effects. Data are well 
postcoded. 

BHPS (British Household Panel Survey) 

The initial sample was clustered by postcode with, on average, about 30 
respondents in each of 250 postcode sectors. 

EDUCATION MAINTENANCE ALLOWANCE EVALUATION 

This study of the EMA Pilot Areas, which were selected because of high rates 
of deprivation and low staying on rates, was based on a random sample of 
10,000 16/17 year olds in the 10 pilot areas and 11 matched control areas. 
Individual respondents were also matched in the analysis. One conclusion 
from the interim first year results (Ashworth et al., 2001) was that the overall 
positive effects varied for different groups of young people and areas. 

MENTAL HEALTH OF CHILDREN AND ADOLESCENTS SURVEY 

The initial sample was clustered by postcode with, on average, about 25 
respondents in each of 475 postcode sectors. 

MILLENNIUM COHORT STUDY 

The sample is clustered by ward, with 200 wards in England, 73 in Wales, 63 
in Northern Ireland and 62 in Scotland. The wards are stratified by a measure 
of child poverty, and it is intended to collect both respondents' views about 
their neighbourhood and aggregate measures. 

SoLIFISoF (Survey of Low income familiesifamilies with children) 

The initial sample was clustered by postcode with, on average, about 30 
respondents in each of 1 50 postcode sectors. The restricted nature of the 
sample, to low income families at least for waves 1 and 2, means that analysis 
of neighbourhood effects would be problematic. 

SURE START EVALUA TION 

The impact study will be clustered by local Sure Start project area. Although it 
has not yet been decided how many of these areas will be included, it is 
unlikely to be less than 100. 




32 



INDICES OF DEPRIVA TION 2000 (ID 2000) 

The English ID 2000 were produced at ward level using ward boundaries at 
April 1998. For all of the 8,414 wards in England (wards in the City of 
London were combined, as were wards in the Isles of Scilly) there is an 
income deprivation; employment deprivation; education, skills and training 
deprivation; health deprivation and disability; housing deprivation; 
geographical access to services deprivation; child poverty and Index of 
Multiple Deprivation score and rank. In addition, there are six district-level 
summaries of the Index of Multiple Deprivation. 

WELSH INDEX OF MULTIPLE DEPRIVA TION 2000 

The Welsh IMD 2000 was produced at Electoral Division (EDiv) level. For all 
865 EDivs in Wales there is an income deprivation; employment deprivation; 
education, skills and training deprivation; health deprivation and disability; 
housing deprivation; geographical access to services deprivation; child 
poverty and Index of Multiple Deprivation score and rank. 

MEASURES OF DEPRIVA TION FOR NORTHERN IRELAND 

The Measures of Deprivation were produced at ward level, using boundaries 
existing at the time of the 1991 Census. For all 566 wards there is an income 
deprivation; employment deprivation; education, skills and training deprivation; 
health deprivation and disability; geographical access to services deprivation; 
housing stress; social environment deprivation; child poverty and Multiple 
Deprivation score and rank. There are six Local Government District 
summaries of the Multiple Deprivation Measure. In addition there are ED level 
measures of income deprivation, employment deprivation, and economic 
deprivation, with two ward-level summaries of the economic deprivation 
measure. 

THE CENSUS OF POPULATION 

The 1991 Census of Population will soon be superseded by the 2001 Census 
of Population which should be released in 2002/3. The most relevant 
subdivision will be the census Output Areas (OAs), broadly equivalent to 
Enumeration Districts in the previous census, with targets of 100-125 
households in England, Wales and Northern Ireland and fewer in Scotland. 
Unlike the previous ED divisions that do not have socially meaningful 
boundaries, the 2001 OAs will be defined in ways that take some account of, 
for example, housing tenure. 




33 



36 



CHAPTER 5 - DATA LINKAGE: TECHNICAL, LEGAL AND ETHICAL 
ISSUES 



5.1 Introduction 

In this chapter we review the issue of data linkage at both the individual and 
small area aggregate levels. This raises technical as well as legal and ethical 
issues. At one extreme there are examples where extensive data systems, 
depending on data linkage, have been set up. For example, Leeds City 
Council operates a system to link data at an individual address level across 
many different departments, including free school meals, income support and 
housing benefit, educational performance etc. At the other extreme, there are 
cases where researchers working for local authorities have been stopped from 
linking data, for example a project that set out to link the school! education 
based free school meals data with the Housing Benefit system. While this 
example appears to be anomalous, other cases may be explained by the 
different powers, formal access rights to data etc held by different groups 
seeking to analyse the data. Thus, what may be possible for central 
government departments (or in some cases for one department but not for 
others) - and possibly by extension to the ‘agents’ of this department 
(including researchers working under contract) - may not be possible for 
researchers acting on their own account, or under charitable or research 
council support. These complexities make it very hard to indicate any general 
guidelines, though at one extreme there should be no blanket refusal unless 
there are specific reasons (e.g. legal restrictions - see below). 

In our review, we came across three major initiatives within government that 
have focused on these general issues of the practical, legal and ethical issues 
of data linkage, all. from slightly different perspectives. These.are: 

(i) the Policy Action Team 1 8 report Better Information (SEU, 2000), including 
the helpful contribution made by the then Data Protection Officer (now 
Information Commissioner) on the use of data. The concern of this group was 
to explore the possibility of establishing more up to date and comprehensive 
datasets to throw light on the problems of deprived neighbourhoods. 
Stemming from this are the various developments following PAT18 within 
government, including the major ONS ‘Neighbourhood Statistics’ initiative - 
see: 

http:llwww.statistics.gov.uklneighbourhoodlhome.asp 

(ii) the GSS ‘task group’, initially at the DfEE, now at ONS, on the issue of 
linking administrative data within government. This group has focused on 
setting guidelines and procedures for data linkage within government. The 
group has commissioned a lengthy technical review published by ONS (Gill, 
2001). While this review covers ethical and legal issues its central focus is 
technical data matching problems. 

(Hi) a high-level advisory group run under the auspices of the Performance 
and Innovation Unit (PIU) in the Cabinet Office, and chaired by Lord Falconer, 
into ‘Privacy and Data Sharing’. The group has a number of external 
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experts/ interested parties. The minutes of this group have been made public 
on its web site at: 

www.cabinet-office.gov.uk/innovation/2000/privacy/datascope.shtml 
although its final report has yet to be released. The focus of this committee is 
more on the general issues raised by record linkage and privacy not only 
within government but also more generally in the commercial sector, and the 
interface between the two. The same stance of there being lack of any clear 
guidance or understanding of what is being done, can and should be done 
runs through the minutes of this group. The final set of minutes (18‘^ April 
2001) indicates that the thrust is towards setting guidelines and principles in 
the balance ‘between privacy and data use’ and establishing greater 
‘transparency’ over different aspects. 

What follows can only be a partial coverage of this very wide territory, and not 
in any sense final guidance in a very complex field. We begin by looking 
practically at what is happening now (with a few selected examples), and then 
what may be possible in the near future. We briefly address the legal and 
ethical issues in the final section. 



5.2 Current position on data linkage 



In one sense there is nothing new about data linkage. Many studies have 
collected and combined information from different sources, for example cohort 
studies such as the National Child Development Study (NCOS) collected data 
from schools and teachers using schedules sent out to schools with NCOS 
subjects in them. And some form of automatic or semi-automatic data linkage 
is not that new either. Gill (2001), in his comprehensive review of data linkage 
techniques, reminds us that the Oxford Record Linkage project, originally, 
used to link patient records automatically across hospitals in the Oxford area, 
dates from the late 1960s/early 1970s. Gill’s account defines the term 'data 
linkage’ as essentially the combination of two or more different records ‘that 
are believed to belong to the same person, family or entity’ (p.13). The ONS 
Longitudinal Study, based on a 1% sample of census records since 1971 is 
linked into the national registers of births, deaths and new cases of cancer. 

What, however, is new in the last few years is that central records have 
increasingly been computerised and systematised, and therefore indexing and 
linking schemes have been built up for management purposes. Thus many 
local authority housing benefit systems (e.g. the ICL HBIS) have a unique 
numbering scheme at individual level that potentially allows individuals 
(including children) to be tracked across benefits units and linked overtime if, 
for example, there are repeat claims at a later date. These are primarily for 
management purposes (for example, to avoid duplication, prevent fraud etc). 
And nationally in recent years, very elaborate schemes have been developed 
as part of the Benefits Agency’s ‘Generalised Matching Service’ (BA-GMS) to 
detect possible fraudulent benefit claims. These link together national benefit 
and other systems, including data from a very wide range of government and 
other sources to identify anomalies in claiming, or to track individuals where 
fraud may be suspected. These records, of course, include full identifiers. 
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which can be used as the search mechanism. While the intent is to track 
individual cases, the result has often been to build up impressive and 
comprehensive sets of data. Thus, the related Housing Benefit Matching 
Service extracts a full set from the Housing Benefit/Council Tax Benefit 
system from more or less every district council in GB, every three months, at 
an individual claimant level. These data are then cleaned and turned into a 
standard format file (Local Authorities use many different commercial non- 
standard packages). While this data is collected for anti-fraud purposes it can, 
in principle, be used for other objectives (e.g. tracking down under-claiming). 

A final example of the development of data linkage for predominantly 
management purposes is in the evaluation of the ‘ONE’ benefit delivery 
project at the DSS, where information on different benefits has been brought 
together into a single benefit administration system. 

These three examples of developments predominantly for management 
purposes demonstrate the sheer volume of such information now collected 
and the potential scope for linking this information together for research 
purposes, and specifically on the question of child poverty and child 
outcomes. But only very recently has this potential been exploited by 
researchers. 



5.3 Types of data linkage 



5.3.1 Administrative data to administrative data at individual level 

This could involve either linking extracts from different time points to one 
another at an individual level to create a longitudinal database, for example 
using, NINQs (encrypted in a standardised way) to link together extracts of, 
say. Income Support. Or the scope could be extended by linking across data 
extracts from different benefit or tax credit systems either cross-sectionally or 
longitudinally, e.g. fitting in WFTC cases to IS and JSA-IB which would cover 
a very large proportion of low-income households containing dependent 
children. Again, longitudinal analysis would require a matching variable, 
typically a NINO, though in principle, within government or for those working 
for government, it would be possible to match using the type of matching 
techniques described in detail by Gill (2001) where names and addresses or 
other identifiers can be used. 

We give some examples of recent developments in this area that indicate 
what is practically possible. 

(i) Linking Housing Benefit data extracts at Local Authority level 

In a study of welfare dynamics among lone mothers. Noble, Smith et al. 
(1998) linked together seven individual-level extracts of Housing 
Benefit/Council Tax Benefit (HB/CTB) over a three year period in one large 
district authority where individual reference numbers were used. This allowed 
both longitudinal analysis and potentially follow-up of individual household 
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members where households had re-formed during the time period, provided 
they remained on benefit in the same Local Authority. 

An extension of this method of data linkage is reported by Platt (forthcoming), 
who used HB/CTB data from a very large metropolitan district extracted 
quarterly over an 18-month period and traced the patterns for dependent 
children by ethnic group. The individual numbering system used in this 
authority allowed individual children to be tracked and not just benefit units. 
This HB/CTB system was also (unusually) ethnically coded, allowing the 
results for different ethnic groups of children to be followed up. She was able 
to demonstrate different patterns of welfare dynamics for different ethnic 
groups. She also looked specifically at those approaching 16 to measure their 
subsequent pathways, thus illuminating something about the likelihood of 
such children themselves moving on to means tested benefits in their own 
right. 

In both cases the data are fully postcoded, potentially allowing patterns of 
welfare dynamics to be analysed with a local area dimension. 

(ii) Linking national benefit datasets over time 

The DSS 5% Quarterly Statistical Enquiry (QSE) samples of the major means 
tested benefits report trends and numbers and types of benefit recipients 
down to local district level. Noble, Evans et al. (2001) report a study for the 
SEU where a full 100% extract of DSS benefit (IS, JSA-IB) data was linked 
using standardised encrypted NINOs. This study had annual data extracts 
covering the period 1995-1998, though only the data for 1995 and 1998 were 
analysed. Since 2001 the study has been extended to data for 2000 giving a 
five-year span. As the data are also very well postcoded (better than 99%) the 
study was able to track change over time at the local district level and to 
examine both geographical movement and movement between certain benefit 
categories for the two time points. Because of its 100% coverage, virtually all 
areas contain significant numbers of claimants allowing reliable ward level 
figures to be produced. So far only IS and JSA-IB benefits have been used in 
this way. The shift from Family Credit administered by the DSS to WFTC 
administered by Inland Revenue has meant that WFTC data have not, to date, 
become available for this purpose at the 100% (see below for some of the 
reasons). 

(iii) Lifetime Labour Market Database (LLMD) 

Using National Insurance (Nl) data from the old Nl (NIRS) computer, 
researchers at the DSS have constructed an 18 year panel for a 1 % sample of 
the caseload using their Nl contributions record to build up information on 
labour market experience (see Ball and Marland, 1996 for an early output 
from the LLMD). The Nl contributions give some indication of earnings levels, 
type (class) of contributions, pension and Nl credit arrangements etc. This is 
for the whole working age population. Subsequently this Nl information has 
been linked to the New Earnings Survey (NES) panel data where the 1% 
extract is based on the same sampled digits from the NIRS extract, allowing 
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matching via the NINO. The NES gives earnings data and industry. A further 
development is to link these data into information extracted from the 5% 
sample for the DSS QSE as this sample includes the same NINO digit 
selection. Data on IS and subsequently JSA-IB are available since 1992. This 
has been extended to other datasets such as JUVOS (unemployment data, 
see below). This growing dataset is potentially a powerful way of looking at 
long-term income dynamics, but at this stage it would appear to contain 
relatively little other information relevant to child poverty and child outcomes. 
Postcoding is apparently very limited as the data are mainly supplied by 
employers through their annual Tax and Nl returns (form P14). However, it is 
a useful example of the way that data can be built up through linkage to 
address key policy questions (for example, information on ^pensions using 
lifetime earnings data). 

(iv) Joint Unemployment and Vacancies Operating System (JUVOS) 

The JUVOS cohort is a similar 5% sample of claims for unemployment related 
benefits using the NINO as the sampling mechanism. This forms a 
longitudinal database going back to 1982/3, which is updated on a continuous 
basis. This allows analysis, including event history analysis, of length of spells 
of unemployment, number of spells and intervals between them. It has been 
used as a sampling frame for further studies and as a benchmark for 
evaluating special programmes. Most recently, it has been used in the ‘macro- 
evaluation of the New Deal for Young People (White, 2000) by providing 
comparable data on young people not in the New Deal. The JUVOS data (or 
Claimant Count System) are limited to claim related information, plus usual 
occupation and marital status. They also contain information about the 
reasons for a claim ending. There is no information about other household 
members or children. 

(v) New Deal Databases 

There is now a large number of New Deal databases administered by the 
Employment Service, which include 100% scans of administrative data. These 
draw on data from the overall Labour Market System (LMS) and are also 
linked to the JUVOS system and with other relevant government datasets 
such as benefit data using the NINO. Some research use has been made of 
these datasets to evaluate the New Deal programmes and, for example, to 
contribute to the Index of Multiple Deprivation. Currently these databases 
contain limited information relevant to linking child poverty and child 
outcomes. However the New Deal for Lone Parents and New Deal for Young 
People are likely to be the most relevant. 

5.3.2 Administrative data to administrative data at aggregate levels 

Administrative data can also be linked to aggregate information either on an 
area basis, for example by using postcoding, address or other locational data, 
or it could be at an institutional level. Thus within DfES, there are individual 
pupil records from SATs, GCSEs and other examinations, but there is also a 
mass of school based information from the annual school census and other 
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sources. As the move to introduce a unique pupil number (UPN) nationally 
gains ground it will be possible to build up a database of individual pupil 
performance data together with school data and also some indication of 
school-level performance variables (e.g. simple value added estimates). 
Typically pupil results contain a school reference code (DfES No.). They are 
not currently postcoded to home address in England or Wales, but the 
equivalent data in Northern Ireland at post-primary level are all individually 
well postcoded (better than 99%). Many schools that use the SIMS (school 
information management system) in England, however, already include 
address files with individual postcoded records for all their pupils. There are 
research projects that have drawn on this information (Gibson and Asthana, 
1998; Smith et al., 1999). Also, from 2002, school census data will include 
postcode of residence and also ethnic group for individual pupils (currently 
only available in school aggregate form). 

Information about school quality from OFSTED inspections can also be linked 
through the DfES number to individual schools. Published reports do not 
include the standard rating scales used by the inspection team to rate the 
main dimensions of school quality. However, these have been used by 
researchers to link to area, pupil and school-based data. 

5.3.3 Linkage between survey data and administrative data 

Administrative data can be linked at an individual level to survey data or in the 
form of aggregate neighbourhood or service quality data. Clearly the 
opportunity for such linkage is greatest at the data collection point by the data 
collectors as they will have access to the individual ID and can seek the 
necessary informed consent. Subsequent linkage at an individual level would 
be difficult because of the likely absence of crucial identifiers (though see Gill, 
2001, for matching on a probabilistic basis). Matching pupil postcode records 
with pupil examination data where there was a date of birth and gender flag in 
both datasets and the data were grouped by individual secondary school or 
exam centre number, produces only very few ambiguous matches at school 
level (Smith et al., 1999). 

Survey records have also been successfully matched to subsequent 
administrative data. Thus, following a detailed survey of claimants answering 
a screening question on limitations to their mobility, their administrative 
records were studied to assess how many subsequently successfully claimed 
Disability Living Allowance (Noble and Daly, 1996; Noble and Platt, 1997). In 
this case, the Data Protection Registrar’s advice was sought about the 
subsequent data linkage. 

Matching survey data to area based or institutional service quality data 
requires linking variables. Typically the full postcode would link to local 
geographies, though as noted above postcodes do not remain invariant. 
Institutional number (e.g. school DfES No.) would be a way of linking to other 
datasets, though again these can change (through school closure or merger) 
and there are not always reliable look up tables to match old and new lists. 
Recent examples of such matching would include linking survey data to ward 
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level index of deprivation scores (ID 2000). The spread of up-to-date postcode 
directories, and powerful postcoding packages that work with varied address 
formats means that there should in principle be little technical difficulty in 
making the match when using current data. Historical data are potentially 
much more of a problem because changes in boundaries, postcodes etc. may 
not have been adequately recorded. 

5.3.4 Aggregate data to aggregate data 

This would typically require matching ward codes, postcodes, ED (or OA) 
codes or other geographical referencing. As noted above, changing 
boundaries are not always well recorded. However, there is a growing number 
of studies that have linked different geographies together over time, though 
there is inevitably some degree of smoothing in the process. Examples would 
include ward-to-ward linkage across different censuses and the development 
of look-up tables for this purpose (e.g. by Wilson and Rees, 1998). More fine 
grain links have been made between census enumeration districts in 1981 
and 1991. 



5.4 Future prospects 



Some of the data linkage possibilities have been listed above. This is a rapidly 
developing field. Until about 1996/7 the idea of routinely extracting and 
analysing data from the IS/JSA-IB would have been both technically at the 
edge of possibility and also firmly ruled out by administrative decision. 
Exceptionally, a study supported by the DSS and carried out at the Centre for 
Research in Social Policy (CRSP) at Loughborough, using a paper extract of 
local DSS case level data, demonstrated the potential of such analysis 
(Dobson et al., 1996). The climate has altered significantly since 1997 as the 
advantages and power of such administrative data are realised. It seems that 
such analysis for research purposes, given appropriate safeguards, can be 
undertaken in ways that meet the requirements of the Data Protection Act. 
There are several projects underway that are exploring the more extensive 
use of such data (over longer time periods and involving more datasets). 



Two further areas of possible development should be flagged. One has 
already been proposed and is currently under development, the other is a 
possible future development. 

5.4.1 Programme evaluation using administrative data linked over time 

At least one proposal to evaluate a New Deal programme has suggested 
drawing on a range of such administrative data in individual form to throw light 
on the overall impact of the programme at the local level. This would be a 
complement to more focused survey or observational studies on programme 
effects. The intention would be to build up a range of individual-level data from 
the IS, JSA-IB, WFTC, JUVOS and other relevant administrative data 
sources, including education and training databases if these were part of the 



ERIC 



40 



43 



targetted outcomes of the programme. The programme areas would have to 
be identified geographically and, using some form of matching, control or 
contrast areas would be selected (though randomisation could in principle be 
employed if target areas had not already been identified). The administrative 
data could then be used to monitor changes in the control and experimental 
areas over time. Crucially, by matching individual-level data, it would be 
possible to say something about geographical inflows and outflows to the 
area, which could be exceptionally difficult to pick up by other means. Also, 
importantly, the administrative data could in principle be used to say 
something about the prior conditions (if such data were available before the 
intervention began). This is already possible as such data exist nationally 
effectively since 1995 in a form that allows small area classification. 

More speculatively as a further stage, such data could in principle be used to 
undertake the type of social policy ‘experiment’ employed in the US in housing 
reallocation projects - for example, to assess the impact of poor 
neighbourhoods by selective reassignment of poor families to areas with low 
levels of poverty and subsequent monitoring of their progress in the new 
environment. An example would be the ‘Moving to Opportunity’ project in 
Boston (Katz et al., 1999), though these experiments would appear to raise in 
sharp form not just the ethics of such monitoring but also the modes of 
selection. 

5.4.2 Exploratory analysis of audit and logging data 

One of the major problems with administrative data is that, while they may 
authoritatively identify the patterns and movements over time, they give 
virtually no information about the reasons for any movements. There are a 
number of further possible developments, which may provide some partial 
help, there exists a number of auditing and other logging systems covering 
large areas of administrative data within government. Many of these have 
been developed to identify and check possibly fraudulent activities. Thus 
some benefit systems have a logging procedure that records any changes in 
the benefit files and codes reasons for the change. Many of these are trivial 
but others include significant events (e.g. changes in relationship, additional 
children). If it proved possible to convert some of this information into a usable 
database it could be a very powerful device for charting and explaining 
income and other forms of mobility among children growing up in poverty. 

A further possibility is that information on neighbourhoods, or at least areas, 
obtained from a range of surveys that have a clustered design could be built 
up over time, perhaps by ONS, into a data bank that could be shared between 
surveys. This would be a research analogue to the way that market research 
companies build up information on (postcode) areas based on aggregating 
data from many sources to create an overall profile. 
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5.5 Legal and ethical concerns 



This is a highly technical and legally bound area and therefore only some very 
general observations are in order. The overall impression is that, on 
occasions, a strict blanket interpretation has ruled out activities that would be 
acceptable to the Data Protection Registrar (now Information Commissioner). 
Her discussions with the PAT18 team (SEU, 2000) very helpfully indicated 
that the Data Projection Act (DPA) is not intended to ‘prevent the sharing of 
information for beneficial purposes’ nor ‘does the DPA or DPR prevent 
personal information being aggregated into general statistics.... The 
publication of aggregated statistical information (from which individual 
information cannot be deduced) is not blocked by DPA or DPR.’ (SEU, 2000 
[p17]). 

Para 2.7 from SEU, 2000 based on ‘Helpful Discussions with the Data 
Protection Registrar’. 



Data protection ana area statistics 

• Neither the DPA nor the DPR is there to prevent the sharing of information 
for beneficial purposes - so long as the information is handled in 
accordance with the law. 

$ Nor does the DPA or the DPR prevent personal information being 
aggregated into general statistics. To do this, personal data can be 
anonymised by the department that collects it and then shared; or it can be 
anonymised and aggregated by someone else (for example, ONS) acting 
as an agent and bound by confidentiality. 

• The publication of aggregated statistical information (from which individual 
information cannot be deduced) is not blocked by the DPA or DPR. 

0 The main influence of the DPA is that it makes it dear that Government 
must act within the law in collecting and processing data. This means that 
those collecting data have to know and abide by the powers under which 
they collect information and observe any constraints on its use. (Often 
these powers and constraints are in entirely separate legislation, or the 
common law duty of confidentiality.) 

• Departments and agencies are not always aware of the powers under 
which they collect, process and share data. Some departments have 
carried out audits of their powers. This should be encouraged further. 

• When an audit throws up problems, legal powers may need updating to 
allow for the lawful use of information. In some cases the law can be met 
simply by being explicit when collecting information about what statistical 
purposes it might be used for. 

• Generally, it should be possible for agencies to share the data to generate 
area statistics - but this needs to be planned from the moment the data is 
collected, not thrown in as an afterthought. 

• The DPA is a framework not a barrier. The DPR has a role in facilitating 
data sharing for joined up government. 



There may, of course, be very specific reasons why some datasets cannot be 
linked or used. Thus, it appears that the 1970 Finance Act may rule out data 
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collected by the Inland Revenue (IR) from being transferred to other 
departments. Thus, while it was possible to make use of Family Credit when 
this was administered by the DSS, since 1999 its successor WFTC has been 
administered by the IR and its use is now more restricted. However, the Tax 
Credits Act 1999 allows the Inland Revenue to disclose tax credits information 
to the DSS (or their contractors) for their social security benefits, child support 
and war pensions purposes. It is not clear whether this would extend to its use 
for research purposes. This will become increasingly important as other major 
benefit systems are transferred to the IR if they are subject to these same 
restrictions. 

Use may also be barred by explicit undertakings made to respondents or 
rulings by Ethics Committees or other bodies. Thus research access to 
individual pupil data records in Northern Ireland required not just the 
permission of the Nl Department of Education but all secondary schools, as 
undertakings had been given that only aggregate data would be released to 
any other party. Research ethics committees play a key role in the case of 
data that involve NHS patients, access to medical records, to NHS premises 
or facilities or to for example, foetal material, the recently dead in NHS 
premises etc. Details of the Local Research Ethics Committees (LREC) and 
to the Multi-centre Research Ethics Committees (MREC) where the research 
involves five or more LRECs can be downloaded from the Central Office for 
Research Ethics Committees (COREC) at http://www.corec.org.uk/ . 

Access may also depend on who is undertaking the research, and whether 
they are acting as ‘agents’ or ‘contractors’ for groups which legitimately have 
such access or entitlement to use the data for research purposes. 

However there appears to be no blanket ban. Research, particularly that 
associated with key government objectives (of which the reduction and 
elimination of child poverty would be an outstanding example) constitutes 
‘beneficial purposes’ which could be contrasted with other (potentially harmful) 
purposes such as the better targetting of individuals, for example to deny 
them credit. 

Under the Data Protection Act 1998 data collection, processing, transmission 
and storage has to be in line with the eight Data Protection Principles (see box 
below). 
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1 . Personal data shall be processed fairly and lawfully and, in particular, 
shall not be processed unless- 

(a) at least one of the conditions in Schedule 2 is met, and 

(b) in the case of sensitive personal data, at least one of the 
conditions in Schedule 3 is also met. 

2. Personal data shall be obtained only for one or more specified and 
lawful purposes, and shall not be further processed in any manner 
incompatible with that purpose or those purposes. 

3. Personal data shall be adequate, relevant and not excessive in 
relation to the purpose or purposes for which they are processed. 

4. Personal data shall be accurate and, where necessary, kept up to 
date. 

5. Personal data processed for any purpose or purposes shall not be 
kept for longer than is necessary for that purpose or those purposes. 

6. Personal data shall be processed in accordance with the rights of 
data subjects under this Act. 

7. Appropriate technical and organisational measures shall be taken 
against unauthorised or unlawful processing of personal data and 
against accidental loss or destruction of, or damage to, personal data. 

8. Personal data shall not be transferred to a country or territory outside 
the European Economic Area unless that country or territory ensures 
an adequate level of protection for the rights and freedoms of data 
subjects in relation to the processing of personal data. 



For research use a key section in the 1 998 Data Protection Act is Section 33 
(‘Research, history and statistics’). Provided that data are processed in ways 
that meet the ‘relevant conditions' that is: 

(a) that the data are not processed to support measures or 
decisions with respect to particular individuals, and 

(b) that the data are not processed in such a way that substantial 
damage or substantial distress is, or is likely to be, caused to any 
data subject. 

then the ‘further processing of personal data only for research purposes in 
compliance with the relevant conditions is not to be regarded as incompatible 
with the purposes for which they were obtained.’ (Section 33). Such data can 
be kept indefinitely and such personal data ‘which are processed only for 
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research purposes are exempt from Section 7 [Rights of Access to 
Information] if: 



(a) they are processed in compliance with the 
relevant conditions [as above], and 

(b) the results of the research or any resulting statistics are 
not made available in a form which identifies data subjects 
or any of them’. (Section 33.4) 

While this sets the main conditions to do with not targetting individuals and 
disclosure, it appears to be compatible with a significant amount of use for 
research purposes. It does not, of course, give any indication about the 
processing of any particular dataset and the undertakings or powers under 
which it may have been collected. There may also be questions of ownership 
and access rights, for example to the data in the form required to make any 
such linkage effectively. 

In addition to the concern about the general implications of the Data 
Protection Act 1998, there is also concern that linking data together in this 
way may make it potentially easier to identify individuals and thereby breach 
Section 33, even if only inadvertently. This would seem to be less of a 
problem with the primary usage where the data collector must, in principle, 
have some form of access to the identifying data in the first place to collect the 
information, but rather to subsequent secondary use by others. This may be a 
particular problem where research projects are required to deposit data in 
archives and other locations. Access to the primary identifying data would 
normally be covered by the survey practice of keeping data anonymous, and 
identifying lists and codes in a separate highly secure file or system. 

Secondary processing problems might be covered by reducing the amount of 
information available in data with small area geocoding. Thus, the full Labour 
Force Survey dataset is available to researchers from the Essex Data Archive, 
but the LFS set with local district codes (the so-called LFS LA) contains only a 
restricted dataset. Other anonymising techniques include the rounding of 
numbers (used in the ONS Neighbourhood Statistics) and ‘Barnardisation’ 
(randomly adding or subtracting cases were there are very small numbers) of 
aggregate data (used in census SAS or Local Base Statistics datasets). Other 
methods might include signed undertakings on accessing such data by 
researchers, as currently happens to a number of datasets that are covered 
by legal agreements and where any disclosure could lead to immediate 
identification (e.g. Census of Employment where data from employers is 
collected under statutory powers, and could easily identify local employers 
and possibly commercially sensitive information). 

What is needed is a guide to good practice in this field which would make 
clear what is legally permissible and what would be good practice in meeting 
the requirements of the Data Protection Act 1998 for fair data processing for 
research purposes. 
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A further important development might be to undertake key data matching and 
linkage in secure locations for example, under ONS jurisdiction with regulated 
access to any product. 
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CHAPTER 6 - CONCLUSIONS AND RECOMMENDATIONS 



6.1 Introduction 

The first conclusion we draw from this study is, in one way, a negative one. 
The large range of material we have covered, and the extent of possible child 
outcome measures, indicate clearly to us that there could not be a single 
study that took on board all that would be needed to chart and explain the 
relationships between the experience of poverty in childhood, however 
defined, and the major outcomes in the short and medium term. And even if 
this were possible, then the ‘subjects’ at the centre of this exercise would 
surely come to view the research imposition on them as overbearing, as every 
aspect of their lives was scrutinised directly or indirectly. 

With increasing sophistication of research measurement, the studies we have 
reviewed (listed in Appendix 2) demonstrate the current state of the art in 
terms of data collection. Thus, ALSPAC might represent the currently best 
achievable dataset on early child health and development; EPPE the best for 
child outcomes in the early years, linked to service quality and preschool type; 
PRILIF and SoLIF, along with the FRS, as the gold standards for measuring 
the complexities of household income and benefits data; the Home Office self 
report studies (e.g. Youth Lifestyles Survey) the best available for assessing 
offending behaviour among young people; the birth cohort studies best at 
measuring educational progress and achievement; BHPS best at measuring 
income or poverty dynamics; and the clear potential for the use of 
administrative data to measure poverty dynamics for people in receipt of 
benefits. 

But clearly each of these studies does not ‘play’ so strongly in other parts of 
the field. Those that are strongest in the income domain often have very light 
coverage of other fields, particularly some of the more difficult to measure 
outcomes (for example, offending behaviour). Indeed, apart from the many 
studies that demonstrate an association between income and outcomes, we 
are aware of very few studies based on UK data that go beyond that towards 
estimating at least some of the paths in our basic model represented by Fig. 
2.1. McCulloch and Joshi (2001a, b) link income and neighbourhood data to 
cognitive outcomes for children of the NCOS cohort and a recent DSS- 
sponsored study looks at the link between income dynamics and adolescent 
outcomes using BHPS data (Ermisch and Francesconi, 2001). The Education 
Maintenance Allowance pilot studies also suggest that raising incomes via 
allowances may promote staying on at school, but that this may vary by 
geographical area and social group (Ashworth et al., 2001). 

From another perspective only the panel studies and to a limited extent (but 
with potential, we would argue, for more) the administrative datasets allow for 
a very strong handle on the crucial question of the time dimension, or the 
welfare or income dynamics. To date, the major longitudinal birth cohort 
studies have had too infrequent a cycle to pick up more than rather broad 
changes, although they do cover all the period of 'childhood'. Strikingly, the 
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more fine grain and closejy spaced studies are, the more variability or mobility 
there is, underlining perhaps in new ways the uncertainties and instabilities of 
at least some child poverty at the beginning of the 21®' century. 

Finally, on the question of geographical linkage there are variable prospects 
for effective data linkage, depending in part on whether samples are clustered 
in some way. Again the position might vary from, at one extreme, 
administrative data that include every case, studies that are concentrated in 
one area or region (for example, ALSPAC), surveys with a national clustered 
sampling procedure (MCS), and at the other extreme a more or less evenly 
spread national sample survey with very few cases in any area (as in the 
1946, 1958 and 1970 longitudinal cohort studies, where date of birth was the 
criterion for entry). 

Hence our first conclusion is that it would not be feasible to design a research 
exercise that somehow maximised on all these very different features. So, in 
another way, our first conclusion is positive - we do not see the need for a 
large amount of totally new data collection exercises. If such were needed, 
they should probably best be built Into, or derive from existing and planned 
studies. For example, future waves of BHPS could be expanded to Include 
new samples of young people and to collect more detailed data about families 
with young children. But we do believe there is scope for a lot more data 
analysis. To put this another way, we believe the infrastructure is, or soon will 
be, in place to provide answers to the questions posed at the beginning of this 
report. Some of the answers might not emerge for some time but that is the 
nature of research that needs, and relies on, longitudinal data. We know no 
way of speeding that up. 



6.2 Research strategy 



We therefore see the most effective strategy as being one that starts from the 
existing range of possible datasets and builds on these. There would be six 
major elements to such a strategy: 

6.2.1 At the individual level: linking administrative data over time 

As we have noted, administrative data have been used. for many years, and 
programmes to link datasets have existed at least since the 1960s (for 
example, the Oxford Record Linkage Study - Gill, 2001). However, this 
development has gathered speed in the last few years as more central 
government systems have become available for analysis in various forms. 
Many of these are directly relevant to children (for example, the Child Benefit 
system), child poverty (for example, DSS benefit systems that include details 
of children), and data on outcomes (for example, pupil records and 
examination results). There are technical Issues In linking these data together, 
but there are increasing numbers of projects doing this and techniques, data 
availability and linking variables have become more readily available. We see 
very substantial steps being made in this area as some of the exercises 
currently underway come to fruition (for example, work in ONS to generate 
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reliable small area income estimates - see section 4.2.2). While there have 
been longitudinal datasets covering the labour market domain for many years 
(see section (5.3.1) on the JUVOS cohort and LLMD), those including children 
are much more recent. At the same time, ethical and data protection issues 
that are associated with this type of data linkage have become clearer, as we 
set out in Chapter 5. While there are a number of major outstanding issues, 
for example the location, storage and research access to such linked 
datasets, we would argue that this development could make a major 
contribution to a framework of data linking child poverty, child outcomes, 
service quality and neighbourhood-level data. 

However, it is likely that such administrative data would always explain only 
part of the story and leave other bits tantalisingly out of reach (as people leave 
the benefits system or other aspects of the state’s purview). It might be that, in 
due course, everything will be logged in some system or other, but we are 
sceptical whether administrative data on their own could ever close the door 
on what is needed to unravel the linkage between child poverty and child 
outcomes. However, in our view it will have an increasing role, sometimes in 
support of other more intensive data collection, and sometimes as a precursor 
to more intensive studies (as it can act as an effective research ‘screen’ for 
picking out overall patterns across very large population-wide datasets). 

So some part of the effort in the future should go into developing such a 
framework of relevant administrative data, by building on what has already 
been done, but also experimenting with further developments, particularly 
trying to generate more dynamic sets of information. The model could be the 
existing JUVOS cohort or LLMD systems. Such developments on the child 
poverty/child outcome field should be linked to other developments to produce 
codes of practice, and procedures and location for storing and granting 
access to these growing bodies of data. It is clear that some of the ethical and 
legal issues touch on these second-order uses. It may be acceptable for one 
group to link such information and use it for a research study, but access by 
subsequent groups may throw up problems if, for example, additional data 
result in potential (unwitting) disclosure. This may require a central resource 
such as ONS as the holding agency and possibly their also providing ’safe- 
setting’ analysis facilities as they do at present for the ONS Longitudinal 
Study. Therefore, our recommendations are to: 

1. Build up a longitudinal administrative database directly relevant to child 
poverty along the lines of the LLMD initiative. 

2. Explore the use of further administrative data that may throw light on any 
changes/dynamics. 

3. Develop mechanisms to link and access these data securely. 

6.2.2 At the individual level: linking administrative data to surveys 

With the increase of information in a form where linkage between 
administrative data and surveys could potentially occur (for example, unique 
pupil numbers and the associated educational records; health records), we 
would argue that many existing research studies could, in principle, add 
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significantly to their impact and value by linking in such data. This is likely to 
require more accessible information about what is available, information on 
the possible mechanisms for linkage, and it would certainly require codes of 
practice about ethical and legal issues. 

We give four examples of surveys that could profitably add in significant 
amounts of individual data from administrative sources. In each case, there is 
already some work along these lines either in place or planned, but facilitating 
it in the areas of particular relevance to this report could be valuable. 

1. Linking more local health, education and socio-economic data (including 
possibly benefits data) into ALSPAC. 

2. Linking data from Child Health Records and other routine health data on, 
for example, hospital admissions into the early waves of the Millennium 
Cohort Study. 

3. Linking educational data, and data about schools, into the British Youth 
Panel component of the BHPS. 

4. Linking administrative data, especially benefits data, into evaluations of 
area-based initiatives such as Sure Start and New Deal for Communities. 

6.2.3 At the aggregate level: linking administrative data about neighbourhoods and 
services to individual level surveys 

Just as individual administrative data could be linked to individual survey 
responses, so could administrative data about neighbourhoods (however 
defined) and services be linked to records that have the relevant geocodes. 
There is a growing body of neighbourhood statistics - the Indices of 
Deprivation 2000, data generated from censuses, schools' examination 
results and so on - that could be linked. Some words of caution are, however, 
needed about this approach. These were highlighted in Chapter 4 and in 
Appendix 3. For example, is the neighbourhood statistic necessarily valid for 
the sample member (because of boundary problems and sampling issues)? In 
addition, we have already flagged the issue (well covered by the PAT18 
report) of changing administrative boundaries, postcodes and other 
geographies, which can make any longitudinal comparisons exceedingly 
difficult. We believe the issue of estimating neighbourhood effects and 
separating them from the effects of services is a complex one that warrants 
further investigation. 

Again, we give three examples where aggregate data could be linked to 
studies: 

1. Not only neighbourhood-of-residence data but also adjacent 
neighbourhood data could be added to ALSPAC for different time points, 
thus creating the possibility of a thorough examination of neighbourhood 
effects on child outcomes. 

2. MCS and all evaluations of ABIs would benefit from the addition of relevant 
neighbourhood data. 
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3. Clustered surveys - for example, SoLIF/SoF and the Mental Health of 
Children and Adolescents Survey - could draw on outside neighbourhood 
data to help explain any neighbourhood effects in the data. 

6.2.4 Constructing and using neighbourhood data from clustered designs 

Most big national surveys have a design that is geographically clustered, 
commonly by postcode sector. Clustering is generally used for sampling 
efficiency. It would be possible to exploit the information contained in the 
clusters much more than it is at present. For example, it would be ppssible to 
get an estimate of the mean prevalence of mental health problems per cluster 
for families with a child aged 5 to 15 from the Mental Health Survey of 
Children and Adolescents. This information could then be used in a statistical 
model to explain any variability in prevalence across postcode sectors. 
Moreover, some of the problems discussed earlier about the need to separate 
within and between neighbourhood variability in order properly to assess 
neighbourhood effects are less acute with clustered designs so that the 
combination of survey generated and administrative data will be more 
convincing. 

6.2.5 Using data from evaluation studies and designed experiments 

This fifth element of our strategy is a little different in that it refers to a different 
research approach. This report has concentrated on studies and data sources 
that can provide estimates for some or all of the pathways specified in Fig. 
2.1. In other words, we have been concerned mostly with observational data 
that can reveal associations and that can also, in certain circumstances, 
provide explanations of whether and how changes in income lead to changes 
in child outcomes. There is, however, another way of learning about the link 
between income and child outcomes and that is by deliberately changing 
income and seeing what happens to later outcomes. In some ways, this 
happens all the time, especially with the benefit system as new benefits are 
brought in or upgraded (and so some gain) and others are reduced or phased 
out, creating losers. The introduction of the National Minimum Wage in 1999 
may have boosted incomes for some low income families. The impact of the 
switch from Family Credit to WFTC or other significant changes in benefit 
levels might similarly be explored. In all these situations, it is, at least in 
principle, possible to monitor the effects of changes in income and to relate 
them to changes in child outcomes. The problems with this approach are, first 
that it is often very difficult to separate the effects of the change in income 
from other changes in society that everyone experiences, and, second that 
the inferences are often based on aggregate data rather than on data for 
those individuals who either did or did not experience a rise in income. 

Therefore, a more convincing approach to the question of what happens if 
children in families experience a marked rise in income is to do an experiment 
of some kind so that some but not all families in poverty receive additional 
income. Those families, ideally selected by chance, receiving the income 
boost are compared with those - the control group - who are not so lucky. This 
approach has been tried in the United States (for example, the Negative 
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Income Tax experiments) and, more recently, in Canada as part of the Self- 
Sufficiency Project (http://www.srdc.org/english/projects/SSP.htm). Education 
Maintenance Allowances (see Ashworth et al., 2001), introduced on a pilot 
basis in England in 1999, represent a similar idea. The main concern of 
experiments of this kind is to establish whether or not rises in income lead to 
improved child outcomes. Methods for the design and analysis of studies to 
evaluate the effects of this kind of intervention are given in, for example, 
Plewis and Preston (2001) and the references therein. 

6.2.6 Better exploiting existing data 

The final recommendation is more one of a mechanism for deriving the 
maximum advantage from the very wide range of data and data sources we 
have reviewed. This is based on the observation that the range of research 
skills involved in studying the links between child poverty and child outcomes 
is enormous (that is, expertise is required across disciplines to take in a very 
wide range of substantive areas). If we combine this with the point that no one 
single data source or study, however linked, rich and extensive, could answer 
the range of questions set, then the solution has to be as much organisational 
as technical. What we have in mind here is that working groups or teams of 
some kind could be established around certain key cross-cutting themes - 
that is some part of the field denoted in Fig. 2.1. Their responsibility would be 
to draw on the wide range of evidence emerging from the type of study we 
have reviewed, to give what in the jargon has been called ‘best evidence 
synthesis’ (Slavin, 1986). This differs slightly from the normal meta-analysis 
approach to quantitative reviewing, where all studies, as it were, are grist to 
the mill. This alternate method implies a strict quality control in studies that will 
be taken into account. This would seem to be more appropriate when the 
issue is not simply ‘did treatment X work better than treatment Y’ where the 
Cochrane style meta-analysis may be most appropriate, but a much more 
complex sequence of events and processes of the kind we have outlined in 
Chapters 2 and 3. Here the quality of the data and the analysis required may 
be crucial to discerning a robust set of conclusions. 

More work needs to be done to fill out this recommendation, but it might entail 
cross-disciplinary groups meeting or working to review relevant studies. They 
might also be able to suggest additional data collection elements to 
forthcoming studies, in order to throw more light on a sequence for which 
there was not, as yet, sufficient data. Thus, it might be that such a group 
would identify a data need that might be met by a future element in the 
Millennium Cohort Study or ALSPAC. For example, if there were strong 
evidence that children in long-term poverty seemed to have more problems in 
the transition to secondary school and rapidly fell further behind (this is an 
example not a substantive point, though it is based on US evidence that 
children do drop back in educational performance over the summer vacation), 
they could then encourage or commission these studies to look more closely 
at this kind of question. 

It may be that this is done already, and indeed there are many groups that 
operate in this way round the birth cohort studies. It is also possible that the 
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new national co-ordinating centre for evidence based policy and practice, 
established by the ESRC at Queen Mary and Westfield College, University of 
London (http://www.politics.qmw.ac.uk/currentnews.shtml) will take up issues 
of this kind. The centre also has nodes at the Social Policy Research Unit at 
York and one specifically related to children at City University. But, if so, we 
would suggest that these groups could be funded to be dedicated in their 
focus on child poverty and child outcomes, to reflect the centrality of these 
issues to policy in the medium and long terms. 



6.3 Further specific suggestions 



The six proposals in (6.2) represent our main general conclusions. In addition, 
we make a number of more specific suggestions - illustrative and certainly not 
exhaustive - about analyses of existing data that would, we believe, enhance 
our understanding of the links between child poverty and child outcomes. We 
list them in the order in which they could be carried out, starting with the 
earliest: 

1 . It should be possible to use the data from the Mental Health of Children 
and Adolescents Survey to establish if there are any 'neighbourhood' (that 
is, sample postcode) effects on the health outcomes, having controlled for 
family income. This analysis could serve as an exemplar of how the 
clustering built into many survey designs could be used, by linking external 
sources of data at the neighbourhood level to the individual file. 

2. It would be possible to extend the analysis in McCulloch and Joshi (2001a) 
who use data from the children of the NCOS cohort. The child outcome - a 
score on a picture vocabulary test - could be related to parental income at 
three time points (when the cohort member was aged 23, 33 and 42), to 
the income of the grandparents and to a measure of the parenting 
behaviour at home. This would add to our understanding of the effects of 
income dynamics, and inter-generational transmissions of income, on child 
outcomes. 

3. With the collection of more child outcomes in PRILIF and SoLIF/SoF, it 
should be possible (by 2003) to look more closely at the link between 
income dynamics and child outcomes from those two studies. The 
relatively long time span of PRILIF would be helpful here as, eventually, 
will the extension of SoLIF to include all families with children. The 
analyses of the data from these two studies would be strengthened by the 
collection of some of the process variables described in Chapter 3. 

4. Data on attainments at Key Stage 2 for the ALSPAC cohort should 
become available for analysis by 2004. These (and other outcomes) could 
then be used in a model that would come close to the one set out in Fig. 
2.1, especially if neighbourhood and service data were linked to the 
individual data file. It is worth noting that the ALSPAC data are not publicly 
available and their use requires funds that contribute to the life of the 
study. 
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5. Data for children aged about 30 months should become available from 
both MCS and the Sure Start evaluations by 2005. This age is perhaps a 
little young to expect substantial income effects but, by 2007, when the 
children will be approaching the start of school, more informative analyses 
should be possible. 

6. Additional administrative information that is currently not available for 
research purposes, particularly the WFTC data, should if possible be 
added to existing administrative datasets made available for analysis. If 
this is not possible for legal reasons, consideration may need to be given 
to appropriate legislation. 

There are also some other outstanding questions that would benefit from 

further research: 

• There is little information about intra-family transfers of income and 
especially about how family income is used for children, and whether 
and how this varies by family income levels. While our brief was 
essentially to review quantitative studies, this might be an example 
where there was need for both quantitative and qualitative enquiry. 
Estimations by Platt (forthcoming) show that in families with larger 
numbers of children living on basic means tested benefits there is 
proportionately less funding per child, as family size increases. However, 
we have very limited information about the impact of this constraint on 
actual allocations within the family. It would probably require qualitative 
studies to illuminate this area. 

• Defining a ‘neighbourhood’ for the purppses of this paper and whether 
this can be turned into some administrative routine. There is evidence 
that some countries (for example, the Netherlands) do have ways of 
identifying ‘neighbourhoods’ in a better than administrative sense. It 
would be worth investigating to what extent Census Output Areas in 
2001, and aggregates thereof, represent ‘neighbourhoods’. 

• The problem of identifying groups such as ethnic minorities that may 
follow very different trajectories than other groups. Most survey data 
have very few cases for some ethnic groups. Most administrative data 
are not ethnically coded. The Millennium Cohort Study will in time fill out 
this picture as it over-samples areas with high levels of ethnic minorities. 
There may be scope for some existing surveys to undertake booster 
samples to extend this aspect. 

• Similarly there may be other booster samples or follow-up studies that 
could be undertaken to enhance data on key groups or key age points. 
For example, if the DfES Longitudinal Study of Young Persons’ 
Transitions goes ahead, it could be invaluable in helping to fill the gap of 
what is sometimes referred to as the ’missing cohort’ problem; the cohort 
of children born in the, middle 1980s who are now in the late stages of 
compulsory schooling and about whom little is known. 
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• Data on access to and use of services needs to be developed. 

• More thought needs to be given to the question of measuring service 
quality and how 'quality' data can be integrated into analyses linking 
child poverty to child outcomes. 
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APPENDIX 1 - MODELLING INDIVIDUAL EFFECTS 



Let us first consider the model shown on p.8 and assume that we have 
measures of changes in income (xi), diet (>^), health (yi) and school success 
(y 2 ). We will also assume, for convenience, that all of these measures are 
continuously distributed. This is a recursive model that can be illustrated as: 




We can write down this model just as a series of regression equations: 







(la) 


yr 




(1b) 


y^ 




(1c) 



where the superscripts (t-1, t, t+1) represent an assumed time ordering 
although, in practice, some measurements will be obtained at the same time 
In equation (1a), yz is the change in diet (from time (t-1) to time t) and xi the 
change in income (from time (t-2) to (t-1)) and similarly for the other two 
equations. These equations can be estimated separately and the estimates 
will indicate how strong the 'causal' links are. 

It is, however, possible that school success and health form a feedback loop 
as illustrated: 






The equations are now: 

=«o +e, (2a) 

= ^ + + e, (2b) 

=Co +C,7|'''‘'’ +C 2 x|'‘'^ +C^X^2^ +63 (2c) 

where (changes in) health and school success are assumed to be measured 
at the same time (t+1), and where each influences the other. Equations (2b) 
and (2c) form a set of simultaneous equations because yi and y 2 each appear 
on both the right and left hand sides. As they stand, these equations are not 
identified and cannot, therefore, be estimated. Restrictions would need to be 
imposed - for example, excluding xi from (2b) and from (2c) - and then a 
technique like two stage least squares could be applied. Further details about 
these issues, as they apply to longitudinal data, can be found in Plewis (1985, 
[PP67-71]). 




62.. 



65 



APPENDIX 2 - DATASET 

A. List of the most useful surveys/datasets 

(i) Surveys/Panels with Longitudinal Data 

STUDY A1 ALSPAC (Avon Longitudinal Study of Parents 

http://www.ich.bris.ac.uk/ALSPACext/Default.html 

Study Design A longitudinal study of all children born to mothers who, when pregnant, were 
resident and whose expected date of delivery was between 1 April 1991 and the end of 1992. 
14K; those moving out of Avon have been retained in the sample. 

Chiid Outcomes Education: test data, school entry assessment and KS1 SAT data. 

Health: Wide range, including extensive physical samples. 

Crime: Self-reported at age 10. 
income Some data on four occasions. 

Income proxies. 

Processes Wide range for education and health. 

Neighbourhood Data The study initially included all births in a relatively small area, so the 
sample is highly 

chosen. It also includes data on primary schools attended, health services used and 

Strengths 

(i) Present day longitudinal data on childhood and hence relevant to current policies; 

(ii) Strong on outcomes and on processes; 

(iii) Potential to analyse neighbourhood effects. 

Weaknesses 

(i) Restricted to Avon; 

(ii) Iricome data less strong than outcome and process data; 

(iii) Data collection runs in advance of resources to analyse the data. 



STUDY A2 BCS70 

http://www.cls.ioe.ac.uk/Bcs70/bhome.htm 

Study Design A longitudinal study of all UK births early in April 1970 - originally about 
17K - followed 

Child Outcomes Nearly all outcomes covered. 

Income Some data on two occasions. 

Limited income proxies. 

Processes Wide range. 

Neighbourhood Data Not clustered; not geocoded until adulthood. 

Strengths 

(i) Strong on outcomes and on processes; 

(ii) Covers all the UK. 

Weaknesses 

(i) Refers to the previous generation of children, perhaps restricting its policy relevance; 

(ii) Income data not strong; 

(iii) Problems of attrition and non-response at age 16; 

(iv) No neighbourhood data. 
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STUDY A3 BHPS: British Househoid Panei Survey 

http://www.iser.essex.ac.uk/bhps/ 

Study Design A longitudinal (panel) survey of a probability sample of households in 1991, 
originally 

annual survey of each 16+ adult member of an original sample of 5000+ households 
followed to new households that they may form and interviewed with other adults in 
its eleventh ‘wave’. From Wave 4 a survey of young people - the British Youth Panel 
adult respondents aged 11-15 in a separate panel study. Approximately 1600 young 
once in the BYP. 

Child Outcomes Nothing under age 11, wide range over 16. Focus of BYP on health and health 

behaviours 

more recently. 

Income A core questionnaire is administered every year that collects detailed information on 
and household composition. Supplementary questions are also asked either on a biennial 
Derived household income variables are created prior to the release of the survey. 

Processes Very little available. 

Neighbourhood Data Initially clustered by postcode sector and subsequently geocoded. 

Strengths 

(i) Detailed income and income dynamics data; 

(ii) Rapid release of data. 

Weaknesses 

(i) Not explicitly focused on children; 

(ii) Little process data on links between low income and outocmes. 
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STUDY A4 Mental Health of Children & Adolescents Survey 

http://www.statistics.gov.uk/themes/healtti_care/dwb.asp 

Study Design Cross-sectional survey in England and Wales based on a probability sample of 
postcode 

to 15. Some data were obtained for about 10K children. There has been a subsequent 
full interview follow-up planned for 2002. 

Chiid Outcomes The focus of this study is on children's mental health, especially conduct 
disorders, 

is a small amount of data on educational outcomes, 
income Income data were collected (in bands). 

Processes There is little information about intervening variables. 

Neighbourhood Data Initial sample was clustered by postcode with an average of 25 

respondents in each 

Strengths 

(i) health outcome data; 

(II) potential for looking at neighbourhood effects. 

Weaknesses 

(i) only cross-sectional data available at present; 

(ii) income data not strong. 
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STUDY A5 MCS: Millennium Cohort Study 

http://www.cls.ioe.ac.uk/Mcs/mcsmain.htm 

Study Design A longitudinal study of births in the UK over a 15 month period from September 

2000 , 

interviewed when the child is 9 months old. 

Child Outcomes All relevant outcomes will, eventually, be covered. 

Income To be collected in some detail at each wave. 

Processes Wide range to be collected. 

Neighbourhood Data Initially clustered, focus on social capital, and external 

data to be linked 

Strengths 

(i) Will cover all the parts of Fig. 2.1. and so likely to be relevant to policy in the future. 

Weaknesses 

(i) Longitudinal data not available for several years. 
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STUDY A6 NCOS: National Child Development Study 

http://www.cls.ioe.ac.uk/Ncds/nhome.htm 

Study Design A longitudinal study of all GB births early in March 1958 - originally about 
17K - followed 

sample of children of cohort members was surveyed when the parents were 33. 

Child Outcomes Nearly all outcomes covered. 

Income Income data were collected at age 16 and, to some extent, at birth. 

Some income proxies are available. 

Processes Wide range. 

Neighbourhood Data This study is not clustered and was not geocoded until later waves, after 
the cohort 

Strengths 

(I) Strong on outcomes and on processes; 

(ii) Good response rates over time; 

(ill) Children of cohort relevant to current policy. 

Weaknesses 

(I) Refers to a previous generation of children; 

(ii) Income data not strong; 

(iii) No neighbourhood data. 
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STUDY A7 PRILIF: Lone Parents Cohort 

http://www.psi.org.uk/ 

Study Design Longitudinal study of GB lone parents, selected from a probability sample of 
postcode 

sample of over 900 and annual/biennial since. 

Child Outcomes This study has only a very limited set of child outcomes - for education and 
health 

2001 onwards. 

Income Extensive income and benefits data are available, enabling detailed measures of income 
a relatively small sample size. 

Processes Limited at present to housing quality and parental smoking. 

Neighbourhood Data Initially clustered but with substantial movement over time. 

Strengths 

(i) detailed income and benefits data over a long period; 

(ii) potential to link to child outcomes and processes; 

(iii) relevant to policies for lone parents. 

Weaknesses 

(i) small initial sample; 

(ii) restricted to lone parents in 1991; 

(iii) opportunities for analysing neighbourhood effects very limited. 
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STUDY A8 SoLIF/SoF: Survey of Low Income Families/Families 

http://www.psi.org.uk/ 

Study Design Longitudinal survey, starting in 1999 and annual since, of over 5K low income 
families 

of postcode sectors, extended in 2001 to cover all families with children. 

Chiid Outcomes This study has only a very limited set of child outcomes - for education and 
health 

2001 onwards. 

income Extensive income and benefits data are available, enabling detailed measures of income 
Processes Limited at present to housing quality and parental smoking. 

Neighbourhood Data The initial sample was clustered by postcode with an average of 30 
respondents in 

Strengths 

(i) detailed income and benefits data; 

(ii) potential to link to child outcomes and processes; 

(iii) relevant to family policy. 

Weaknesses 

(i) only a short income series at present; 

(ii) due to the restricted nature of the sample - to low income families in waves 
is problematic. 
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(ii) Relevant Evaluation Studies 



STUDY A9 EPPE: Effective Provision of Preschooi Education 

http://www.ioe.ac.uk/cdl/eppe/ 

Study Design Longitudinal evaluation of preschool effectiveness for a sample of 2857 
children aged 
areas of England 

Chiid Outcomes Battery of cognitive and social/behavioural measures applied at the beginning 
of the 

Key Stage and other data 

income No direct measures though parental occupation and educational qualifications/leaving 
household interview planned. 

Processes Extensive observational studies of preschool environments; information from 
parents 

Neighbourhood Data None at this stage. Proposal to link in ID2000 using home postcode. 

Strengths 

(i) Extensive outcome and process data; 

(ii) Service quality data. 

(iii) Relevant to current policy concerns 

Weaknesses 

(i) Not a probability sample of children; 

(ii) No income data at present 

(iii) No data beyond age seven currently proposed. 
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STUDY A10 Sure Start Evaluation 
http://www.surestart.gov.uk/text/info.cfm 
study Design Not yet known in detail 
Child Outcomes " 

Income " 

Processes " 

Neighbourhood Data The impact study will be clustered by local Sure Start project area. 
Although it has 

areas will be included, it is unlikely to be less than 100. 

Strengths 

(I) Highly relevant to current policies 
(ii) Linked to MCS 

Weaknesses 

(I) Restricted to disadvantaged areas. 
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STUDY A11 Education Maintenance Allowance (EMA) Pilots 

http://www.namss.org.uk/funds_ema.htm 

http://www.lboro.ac.uk/departments/ss/centres/crsp/ 

Study Design Random sample of 10K 16/17 yr olds carried out in 10 pilot and 11 matched 

control 

payments. 

Child Outcomes Staying on at school 
Income Household composition and income data 
Processes None 

Neighbourhood Data Information on staying-on rates and other characteristics of the area 
used to select 

Strengths 

(i) Example of a planned change in income and its effects on a key child outcome; 

(ii) Highly relevant to policy 

Weaknesses 

(i) No outcomes before age 16; 

(ii) Not a probability sample; 

(iii) No process data. 
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(Hi) Aggregate datasets/databases of relevant neighbourhood data 



STUDY A12 Neighbourhood Statistics 

http://www.statistics.gov.uk/neighbourhood/catalogue. 

Study Design n.a. 

Chiid Outcomes KS2 data; University admissions by place of residence, 
income Family Credit, Income Support, Job Seeker’s Allowance data. 

Processes na 
Neighbourhood Data Yes. 

Strengths 

(i) The best available source for up to date neighbourhood data in a consistent form 

Weaknesses 

(i) Limited availability at present: 

(ii) Not available below ward level. 
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STUDY A13 English indices of Deprivation 2000 (ID 2000) 

http://www.statistics.gov.uk/neighbourhood/catalogue. 

Study Design n.a. 

Child Outcomes KS2 data; staying on rates using child benefit; absenteeism; EAL; entry to HE, 
all at 

Income Ward level income deprivation domain. Ward level child poverty domain. 

Processes None 

Neighbourhood Data Cross-sectional ward level data for all 8414 wards in England for six 
‘domains’ of deprivation. 

Strengths (i) Up to date; 

(ii) Available for the whole of England. 

Weaknesses (i) Not available at individual level; 

(ii) Scope is limited to quantifying the presence of deprivation rather than, 
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STUDY A14 Welsh Index of Multiple Deprivation 2000 (IMD 2000) 

http://www.statistics.gov.uk/neighbourhood/catalogue. 

Study Design n.a. 

•Chiid Outcomes KS2 data; staying on rates using child benefit; absenteeism; entry to HE at 
ward level. 

Income Electoral Division level income deprivation domain. Electoral Division level child 
poverty 

Processes None 

Neighbourhood Data Cross-sectional Electoral Division level data for all 865 Electoral 

Divisions in Wales 

Strengths 

(i) Up to date; 

(ii) Available for the whole of Wales. 

Weaknesses 

(i) Not available at individuar level; 

(ii) Scope is limited to quantifying the presence of deprivation rather than, for example, 
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STUDY A15 Northern Ireland Measures of Deprivation 2001 

http://www.nisra.gov.uk 

Study Design n.a. 

Child Outcomes GCSE/GNVQ points score at ward level; secondary absenteeism, staying on 
rates, 

school, all at ward level. 

Income Ward level income deprivation domain. Ward level child poverty domain. 

Processes None 

Neighbourhood Data Cross-sectional ward level data for all 566 wards in Northern Ireland 
for seven ‘domains’ 

and ‘economic deprivation’ measures are available at Enumeration District level. 

Strengths 

(i) Up to date; 

(ii) Available for the whole of Northern Ireland. 

Weaknesses 

(i) Not available at individual level; , 

(ii) Scope is limited to quantifying the presence of deprivation rather than, for exampie. 
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STUDY A16 2001 Census 

http://www.statistics.gov.uk/census2001/default. 

Study Design Released only at aggregate level. Output areas Is the lowest level with target of 
1 00 - 

Child Outcomes Qualifications; principal activity for age group 16-24. 

Income None directly. 

Car ownership and housing tenure are possible proxies. 

Processes None directly 

Neighbourhood Data Effectively the census is small area neighbourhood data 

Strengths 

(i) Universal and uniform coverage. 

Weaknesses 

(i) Only available in aggregate format; 

(ii) No measures of income and quickly becomes out of date; 

(iii) Problems in linking data from 1991 Census at small area level 




77 



80 



3 



STUDY A17 OFSTED EIS System 

http://www.ofsted.gov.uk/public/index.htm 

Study Design Database of OFSTED inspections at school and preschool level. Additionally data 
added. 

Child Outcomes School aggregate results only. 

Income None 

Processes School quality assessment based on inspection reports. 

Neighbourhood Data Contains neighbourhood information on the school in the PICSI report. 

Strengths 

(i) Virtually universal record of maintained schools in England; and from 2002 

(ii) Increasingly includes performance and value added assessments. 

Weaknesses 

(i) School based; 

(ii) Based on professional judgments which may be quantified but not formally 
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STUDY A18 LLMD Database (DWP) 

Study Design Aim of building up lifetime labour market database drawing on samples of national 
estimations of pension requirements etc. 

Child Outcomes None 

Income National Insurance annual returns contain information that relates to income 
Processes Data linked in from JUVOS system on unemployment will provide some information 

Neighbourhood Data None 
Strengths 

(i) Now covers a 1% national sample from 1978 with growing amount of linked data 

Weaknesses 

(i) Focus is on individual labour market participation not on children or child poverty, 
administrative child database 
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STUDY A19 New Deal Databases 



Study Design Several separate databases covering various New Deals and other initiatives. Building 
data from the Labour Market System (LMS), JUVOS, other benefit data and other relevant 
100% level. 

Child Outcomes Limited. New Deal for Lone Parents contains information on youngest child only. New 
information on ‘outcomes’ to supplement JUVOS data on unemployment. 

Income Benefit data. 

Processes Some information relevant to progress on the New Deal programme 
Neighbourhood Data None; postcode level data held but aggregation normally to local 
authority/constituency 

Strengths 100% coverage of key groups relevant to child poverty such as lone parents. Longitudinal 

Weaknesses 

(i) Limited to groups participating in the New Deal programme 

(ii) Limited information to link child poverty to child outcomes. 



B. Surveys and Other Studies of Possible Use 



STUDY B1 

Longitudinal Survey of Young People's Transitions 

National survey of children aged 14+, funded by DfES, to include a substantial 
(unconfirmed at time of writing) 

STUDY B2 

Family ResourcesSurvey 

http://www.statistics.gov.uk/themes/social_finances/surveys/survey_of_frs.asp 
National survey, using stratified clustered sample of addresses in 1680 postcode 
the next year. Current sample size is 34600 households with a 67% response 
interviews. Focus is predominantly on household income and resources, including 
calculation of equivalised household income and income before and after housing 
Strengths; most extensive example of household income data collection. Includes 
Weakness: limited amount of other information. But could be used as a launch 
of its very high quality income data. Plans to explore linking in other data to 
ways of modelling small area income data. 

STUDY B3 
ECHP 

http://europa.eu. int/comm/eurostat/Public/datashop/print-catalogue/EN?catalogue=Eurostat&them 

e=3-Population%20and%20Social%20Conditions&product=CA-22-99-765- -N-EN 

A household panel study, covering most of the countries in the EU, from 1994 
by the BHPS but the first three waves were a separate study. The ECHP is 
outcomes for children under 16 and very little process data. The study could 
first three waves of data have so far been released. 

STUDY B4 

New Deal for Communities Evaluation 

http://www.neighbourhood.dtlr.gov.uk/newdeal/index.htm 
Study not yet finally commissioned by DLTR 

The impact of the NDC is likely to be clustered by the 39 NDC areas and their 
Evaluation may include some use of admin data of a longitudinal type. 

STUDY B5 

Youth Lifestyles Survey 

http://www.homeoffice.gov.uk/rds/pdfs/hors209.pdf 

The first YLS took place in 1992/3 and the second between October 1998 and 
representative and consists of 4,848 young people aged 12 to 30 living in private 
Inner city areas and high crime areas were intentionally over-sampled, 
group, sex and social class. Face to face interview lifestyle questions included 
schooling/work/training/unemployment; income and expenditure; family life; 
victimisation; contact with the police. The self completion questionnaire asked 
attitudes towards illegal drugs, and offending. Twenty-seven types of offence 
about. Written permission was obtained from a parent/guardian for those aged 
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STUDY B6 

Youth Cohort Studies 

http://www.natcen.ac.uk/research/surveys/research_surveys_ycs.htm 

Repeated short cohort studies from minimum age school leaving to age 18/19, with 

from 14K to 25K. Sampling is clustered by school. 

No data on household of origin income though parental occupation/whether parent 
recorded. 

Strong on qualifications and aspirations; and early labour market entry details 
Limited information on processes or neighbourhood factors 

STUDY B7 

Survey of Poverty and Social Exclusion 

http://www.statistics.gov.uk/themes/social_finances/surveys/survey_of_pse.asp 

Cross Sectional survey based on GHS. Follows earlier approach developed by Mack 
establishing consensus on social necessities and then measuring whether households 
Strong alternative to income based assessments of poverty. Child focused element 
necessary for children. 



One in a series of three studies allowing broad comparisons over time. 

STUDY B8 

Family Trust Fund Database 

http://www.familyfundtrust.org.uk/experience.htm 

Since its formation in 1 973 the trust has maintained a database of applicants from 
This now includes records on over 200,000 families. The database includes data on 
difficulties, and information on families such as location, family composition, economic 
ethnicity. The income threshold for applicants is currently £20K pa. 

Families are clearly self selecting and records are at one point in time and not updated, 
base are routed through; 

Dr Bryony Beresford 

Research Fellow 

Social Policy Research Unit 

York University, Heslington, York YO10 5DD 

Email: bab3@york.ac.uk 
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STUDY B9 

DFES survey of parents’ of children aged 0-14 use of childcare 

Survey of about 5,000 households containing children 0-14 every 2 years. 

Clustered sample using Child Benefit as screen. Report by National 
Study likely to be repeated at regular intervals. 

STUDY BIO 

DFES survey of parents’ with children aged 3-4 use of early years 
services 

http ://www.dfee. gov. uk/research/re_paper/RR247 .pdf 

Clustered sample survey using child benefit as the screen for parents with 3-4 year 
Fifth survey in this series to be published in late 2001 . 

STUDY B11 ONS 

Individual wealth and assets study 

ONS is carrying out feasibility work into the possibility of conducting a Wealth and 
been undertaken using the Omnibus Survey, no further details are available at this 

STUDY B12 EHCS 

http://www.housing.dtlr.gov.uk/research/ehcs/index.htm 

The English House Condition Survey is undertaken every 5 years by the DTLR, most 
in January. A full Interview, Physical and Market Value survey for 20,000 addresses 
in 1996, to enable better analysis below national level. Findings should come on stream 
range of topics: housing stock; stock condition; housing quality; household characteristics; 
neighbourhood; disability; local environment quality; property values. Questions are 
circumstances. Households are postcoded and grid referenced, allowing aggregation 
EHCS team state that it is possible to link the data with other geographically referenced 
sets across Government. 
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STUDY B13 

The Children’s Fund 

http://www.dfee.gov.uk/cypu/index.shtml 

The Children's Fund has been established to tackle child poverty and social exclusion. 

Children and Young Persons’ Unit (CYPU) located in the DfES. The Fund will support 
people who are showing early signs of disturbance and provide them and their families 
on track. Its aim is to prevent children falling into drug abuse, truancy, exclusion, unemployment 
worth £450m over three years. Programme elements include preventative work with 
parents, and support for voluntary and community groups working to help those aged 
Wave 1 and more will follow in Waves 2 and 3 in autumn 2001 . Each local area will 
will be expected to collect baseline data and more qualitative information for its own 
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C. Datasets considered but not of direct use 



Cl 

British Crime Survey 

http://www.homeoffice.gov.uk/rds/bcs 1 .html 

Reasons 

(i) Over 16s only 

(ii) Cross-sectional 

(ill) The BCS is predominantly concerned with crimes 
Offending behaviour is therefore covered only 
Sample size is being raised from 20K to 40K with additional 
3K. 

Data is currently recorded by ACORN neighbourhood classification 
neighbourhood data through e.g. postcode of respondent; 
(NCSR) but survey now transferred to new contractor. 

C2 

General Household Survey 

http://www.statistics.gov.uk/themes/compendia_refere 

nce/surveys/survey_of_ghs.asp 

Reasons 

(i) Over 16 -h only 

(ii) Cross-sectional survey 

Potential use as a screening sample for more focused studies 

C3 

Labour Force Survey 

http://www.statistics.gov.uk/themes/labour_market/sur 

veys/labour_force_text.asp 

Reasons 

(i) Over 16s only 

(ii) Mainly cross-sectional but with a short longitudinal 
sample design. 

(iii) The LFS LA contains a local authority district code 
size, it is possible to create estimations at district 
together. 
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APPENDIX 3 - MODELLING NEIGHBOURHOOD EFFECTS 



In this Appendix, we extend the discussion on estimating neighbourhood effects that 
was started in Chapter 4. We do this by considering the strengths and weaknesses of 
different research designs - and hence different statistical models. 

Let us assume a simple model for the population, simpler than Fig. 2.1. In other words, 
we consider a model that is a convenient simplification of real world processes but we 
do not, at this stage, concern ourselves with the constraints placed on estimating this 
model by the demands for a practical research design. 

Assume we have a measure of income for all families (or households), an outcome 
measure for all children, and a measure that represents the social and economic 
characteristics of the neighbourhood for all these children and families. 

Suppose there is a simple model that links these three measures: 

OUTCOMEij = boj + bij INCOMEij + eij (1) 

Here j (j = 1,2...J) represents the population of neighbourhoods and i (i = 1,2...Nj) 
represents the population of children within neighbourhoods. (We will ignore the fact 
that some families have more than one child, and the clustering that this implies.) In 
other words, OUTCOME is affected by INCOME (together with other variables 
represented by ey). In addition, mean outcomes vary from neighbourhood to 
neighbourhood after allowing for the effects of income (represented by boj). Also, the 
relation between OUTCOME and INCOME can, in principle, vary across 
neighbourhoods (bij). 

We also write: 



boj = boo + boi N_HOODj + uoj (2) 

In other words, variability in mean neighbourhood OUTCOME can, at least in part, be 
explained by one or more characteristics of the neighbourhood (N_HOOD). We will 
assume these characteristics can be represented by scales such as ID2000 (see 
Section 4.2.1) or mean income for the neighbourhood, INCOME.]. These measures vary 
from area to area but take the same value for all individuals in an area. 

To complete the model, we write: 

bij = bio + uij (3) 

In other words, any variability between neighbourhoods in the relation (or slope) 
between OUTCOME and INCOME is, from our point of view, essentially random. 

We want to learn more about the influence of neighbourhoods on child outcomes, 
represented by the size of boi. There are, however, potential problems with this model 
even with population data. The problem arises essentially because families choose 
where they live (even though the choice may be very constrained for some) and these 
choice factors could well be related to OUTCOME. Consequently, it is always difficult to 
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know whether effects apparently due to neighbourhood are actually reflecting 
unobserved characteristics of individuals that happen to be correlated with 
neighbourhood variables. 

For example, families choosing to live in (or move to) the catchment area of a school 
with a good local reputation are likely also to be families that put a high value on 
educational success for their children and so spend time and money on other 
educational activities. In other words, N_HOOD or, more generally, uoj are endogeneous 
variables that need to be explained, rather than exogeneous variables that do the 
explaining. The best way of eliminating the endogeneity from the model is to include 
relevant family-level explanatory variables - sometimes known as instruments - in 
equation (1). So, to eliminate (or at least to reduce) the endogeneity arising from 
neighbourhood choices linked to schooling, we could include a measure of the family’s 
attitudes about education. We do, however, have to be careful when using instruments, 
or control variables in this way because they imply that neighbourhood effects are just 
residual effects that cannot be eliminated by individual effects. Consider, for example, 
levels of air pollution. This is an area variable that is likely to affect health outcomes for 
children. If we include a measure of parental health in the model, we might eliminate the 
pollution effect. We could, however, be throwing the baby out with the bathwater by 
doing so if parents' health were poor because of air pollution. 

It is not, of course, possible to estimate any of these models for the population. Instead, 
we must make do with sample data and the question then arises as to how useful 
different kinds of sample data might be. 

A useful design is one that selects a sample of neighbourhoods and then selects 
families (or children) from each sampled neighbourhood, sampling being random at 
each stage. Then, with appropriate measures, the above model, which becomes a 
simple multilevel (here two level) model, can be estimated. A particular strength of this 
design is that different indicators of neighbourhood characteristics can be used - those 
that are generated by aggregating measures obtained from individuals (for example, 
perceptions of local crime); those that are measured at the neighbourhood level (for 
example, evidence of vandalism); and those that come from administrative statistics and 
other surveys (proportion receiving Housing Benefit, for example). One possible 
disadvantage of this design is that, to get accurate estimates of the neighbourhood 
effects (which, in turn, require reliable measures of the neighbourhood variables), a 
large sample of both neighbourhoods and individuals is needed. 

When written down, the model looks just the same as the population model in equations 
(1) to (3) except that j = 1 ,2..J® where J® is the number of neighbourhoods in the sample, 
and i = 1 ,2 ..nj where nj is the size of sample of children in neighbourhood j. It is possible 
to extend the model in at least two ways: 

(1) if we know which services are being used by the family then we can add a further 
level to the model to represent services - most commonly schools - nested 
within neighbourhoods as not all children in a neighbourhood will necessarily 
attend the same school. Quality measures for these services can also be 
incorporated in the model. (Schools are nested within neighbourhoods in most 
sample designs but they are cross-classified with neighbourhoods in the 
population.) 



(2) if we know about all the neighbourhoods the child has lived in over time, or we 
know about changes in the same neighbourhoods, we can represent this in a 
dynamic model. 

There is now an extensive technical and applied literature about multilevel modelling - 
see Plewis (1997) for an introduction and (Boldstein (1995) for more advanced material. 

Another design that has been used to estimate neighbourhood effects is as follows: 

OUTCOMEi = bo + bi INCOMEi + b2 N_HOODi + (4) 

This is the model that has been used for studies that are not clustered (e.g. NCOS) and, 
sometimes, for surveys that are clustered but where the clustering is ignored in the 
analysis. The N_HOOD variable is often one that comes from an outside source - 
administrative data at the aggregate level. Census data, or, in principle, from another 
survey. Model (4) is the same as model (1) except that, using the notation of equation 
(1), i = 1 for all j so that, in (4), the sample size and the number of areas are the same 
(except for chance overlap). 

A disadvantage of this model is that variability within neighbourhoods cannot be 
separated from variability between neighbourhoods. Consequently, we do not know 
whether, and to what extent, there is between neighbourhood variability in mean 
outcome to explain. A further disadvantage is that variables constructed as aggregates 
of individual responses cannot be used. Moreover, it is not possible to allow the relation 
between OUTCOME and INCOME to vary across neighbourhoods (the bij in (1)). All 
these disadvantages mean that it is difficult properly to specify the model and therefore 
any estimates obtained, both in terms of their size and their precision, are unlikely to be 
reliable. 

On the other hand, it is now a relatively straightforward task to link aggregate data to 
postcoded survey data and so this design could throw up some clues about 
neighbourhood effects, especially if the variables used account for a substantial 
proportion of the between neighbourhood variability. But, as before, the proper 
specification of the model at the individual level is crucial. 

Model (4) could be extended to include the interaction between INCOME and N_HOOD 
and, if this were important, it would indicate that the relation between OUTCOME and 
INCOME varies across neighbourhoods. However, a proper analysis of this kind of 
variation can only be obtained from a multilevel design, as in equations (1) to (3). 

Sometimes inferences about area effects are made solely on the basis of aggregate 
data. The model is then: 



OUTCOME] = bo + bi INCOME] + b2 N_HOOD] + e] (5) 

with j representing area and OUTCOME and INCOME measured, for 
example, as area means. 
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In a variant of this model aggregate data for INCOME (and N_HOOD) are 
linked to individual data for OUTCOME. 

These models are attractive if only because they are often the only ones that 
can be estimated, given the availability of data. Estimates from them are, 
however, generally misleading because they seek to estimate processes 
which operate at the individual level by using data that applies only to 
aggregates. In other words, between individual (and between individual within 
area) variations are ignored. This leads to a set of problems, known 
collectively as the 'ecological fallacy' (and discussed by, for example, 
Freedman et al., 1991 and, less pessimistically, by Steel et al., 1996). We are 
not especially interested in the possibility that mean outcomes are better in 
high income areas. What we really want to know is whether child outcomes 
improve as family income rises and whether location makes a further 
difference, and we cannot infer anything about these processes from an 
association between mean income and mean outcome. 
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